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PREFACE TO THE THIRD EDITION 


THE book has again been mostly rewritten to bring in various improvements. 
The chief of these is the use of the notation of bra and ket vectors, which I have 
developed since 1939. This notation allows a more direct connexion to be made 
between the formalism in terms of the abstract quantities corresponding to states 
and observables and the formalism of representatives—in fact the two formalisms 
become welded into a single comprehensive scheme. With the help of this notation 
several of the deductions in the book take a simpler and neater form. 
Other substantial alterations include: 


(i) A new presentation of the theory of systems with similar particles, based on 
Fock’s treatment of the theory of radiation adapted to the present notation. 
This treatment is simpler and more powerful than the one given in earlier 
editions of the book 


(ii) A further development of quantum electrodynamics, including the theory 
of the Wentzel field. The theory of the electron in interaction with 
the electromagnetic field is carried as far as it can be at the present time 
without getting on to speculative ground. 


P. A. M. D. 
ST. JOHN’S COLLEGE, CAMBRIDGE 
21 April 1947 
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FROM THE PREFACE TO 
THE SECOND EDITION 


THE book has been mostly rewritten. I have tried by carefully overhauling 
the method of presentation to give the development of the theory in a rather 
less abstract form, without making any sacrifices in exactness of expression or in 
the logical character of the development. This should make the work suitable for 
a wider circle of readers, although the reader who likes abstractness for its own 
sake may possibly prefer the style of the first edition. 

The main change has been brought about by the use of the word ‘state’ in 
a three-dimensional non-relativistic sense. It would seem at first sight a pity 
to build up the theory largely on the basis of non-relativistic concepts. The use 
of the non-relativistic meaning of ‘state’, however, contributes so essentially to 
the possibilities of clear exposition as to lead one to suspect that the fundamental 
ideas of the present quantum mechanics are in need of serious alteration at 
just this point, and that an improved theory would agree more closely with 
the development here given than with a development which aims at preserving 
the relativistic meaning of ‘state’ thoughout. 

P. A. M. D. 

THE INSTITUTE FOR ADVANCED STUDY, PRINCETON 

27 November 1934 
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FROM THE PREFACE TO THE FIRST EDITION 


THE methods of progress in theoretical physics have undergone a vast change 
during the Twentieth Century. The classical tradition has been to consider 
the world to be an association of observable objects (particles, fluids, 
fields, &c.) moving about according to definite laws of force, so that one could 
form a mental picture in space and time of the whole scheme. This led to 
a physics whose aim was to make assumptions about the mechanism and forces 
connecting these observable objects, to account for their behaviour in the simplest 
possible way. It has become increasingly evident in recent times, however, 
that nature works on a different plan. Her fundamental laws do not govern 
the world as it appears in our mental picture in any very direct way, but instead 
they control a substratum of which we cannot form a mental picture without 
introducing irrelevancies. The formulation of these laws requires the use of 
the mathematics of transformations. The important things in the world appear 
as the invariants (or more generally the nearly invariants, or quantities with 
simple transformation properties) of these transformations. The things we are 
immediately aware of are the relations of these nearly invariants to a certain frame 
of reference, usually one chosen so as to introduce special simplifying features 
which are unimportant from the point of view of general theory. 

The growth of the use of transformation theory, as applied first to relativity and 
later to the quantum theory, is the essence of the new method in theoretical physics. 
Further progress lies in the direction of making our equations invariant under wider 
and still wider transformations. This state of affairs is very satisfactory from 
a philosophical point of view, as implying an increasing recognition of the part 
played by the observer, by observing,’ introducing the regularities that appear in 
the observations, and a lack of arbitrariness in the ways of nature, but it makes 
things less easy for the learner of physics. The new theories, if one looks apart 
from their mathematical setting, are built up from physical concepts which cannot 
be explained in terms of things previously known to the student, which cannot 
even be explained adequately in words at all. Like the fundamental concepts 
(e.g. proximity, identity) which every one must learn on one’s arrival into the world, 
the newer concepts of physics can be mastered only by long familiarity with their 
properties and uses. 

From the mathematical side the approach to the new theories presents 
no difficulties, as the mathematics required (at any rate that which is required for 
the development of physics up to the ‘early Twentieth Century’) is not essentially 
different from what has been current for a considerable time. Mathematics is 
the tool specially suited for dealing with abstract concepts of any kind and there 
is no limit to its power in this field. For this reason a book on the new physics, 


tin himself’ replaced with ‘by observing’ .] 


Vili 


if not purely descriptive of experimental work, must be essentially mathematical. 
All the same the mathematics is only a tool and one should learn to hold 
the physical ideas in one’s mind without reference to the mathematical form. 
In this book I have tried to keep the physics to the forefront, by beginning with 
an entirely physical chapter and in the later work examining the physical meaning 
underlying the formalism wherever possible. The amount of theoretical ground 
one has to cover before being able to solve problems of real practical value is 
rather large, but this circumstance is an inevitable consequence of the fundamental 
part played by transformation theory and is likely to become more pronounced in 
the theoretical physics of the future. 

With regard to the mathematical form in which the theory can be presented, 
an author must decide at the outset between two methods. There is 
the symbolic method, which deals directly in an abstract way with the quantities 
of fundamental importance (the invariants, &c., of the transformations) and 
there is the method of co-ordinates or representations, which deals with sets 
of numbers corresponding to these quantities. The second of these has usually 
been used for the presentation of quantum mechanics (in fact it has been used 
practically exclusively with the exception of Hermann Weyl’s book Gruppentheorie 
und Quantenmechantk.*) It is known under one or other of the two names 
‘Wave Mechanics’ and ‘Matrix Mechanics’ according to which physical things 
receive emphasis in the treatment, the states of a system or its dynamical variables. 
It has the advantage that the kind of mathematics required is more familiar to 
the average student, and also it is the historical method. 

The symbolic method, however, seems to go more deeply into the nature 
of things. It enables one to express the physical laws in a neat and concise way, 
and will probably be increasingly used in the future as it becomes better 
understood and its own special mathematics gets developed. For this reason I have 
chosen the symbolic method, introducing the representatives later merely as an aid 
to practical calculation. This has necessitated a complete break from the historical 
line of development, but this break is an advantage through enabling the approach 
to the new ideas to be as direct as possible. 

P. A. M. D. 

ST. JOHN’S COLLEGE, CAMBRIDGE 

29 May 1930 


*lThe theory of groups and quantum mechanics by Hermann Weyl, second edition translated 
1932 by Howard Percy Robertson1903-1961 Library of Congress Control Number 32-2928] 
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I. THE PRINCIPLE OF 
SUPERPOSITION 


1. The need for a quantum theory 
CLASSICAL mechanics has been developed continuously from the time of 
Sir Isaac Newton and applied to an ever-widening range of dynamical systems, 
including the electromagnetic field in interaction with matter. The underlying 
ideas and the laws governing their application form a simple and elegant scheme, 
which one would be inclined to think could not be seriously modified without 
having all its attractive features spoilt. Nevertheless it has been found possible 
to set up a new scheme, called quantum mechanics, which is more suitable for 
the description of phenomena on the atomic scale and which is in some respects 
more elegant and satisfying than the classical scheme. This possibility is due to 
the changes which the new scheme involves being of a very profound character and 
not clashing with the features of the classical theory that make it so attractive, 
as a result of which all these features can be incorporated in the new scheme. 
The necessity for a departure from classical mechanics is clearly shown 
by experimental results. In the first place the forces known in classical 
electrodynamics are inadequate for the explanation of the remarkable stability 
of atoms and molecules, which is necessary in order that materials may have 
any definite physical and chemical properties at all. The introduction of 
new hypothetical forces will not save the situation, since there exist general 
principles of classical mechanics, holding for all kinds of forces, leading to 
results in direct disagreement with observation. For example, if an atomic 
system has its equilibrium disturbed in any way and is then left alone, 
it will be set in oscillation and the oscillations will get impressed on 
the surrounding electromagnetic field, so that their frequencies may be observed 
with a spectroscope. Now whatever the laws of force governing the equilibrium, 
one would expect to be able to include the various frequencies in a scheme 
comprising certain fundamental frequencies and their harmonics. ‘This is not 
observed to be the case. Instead, there is observed a new and unexpected connexion 
between the frequencies, called Ritz’s Combination Law of Spectroscopy; according 
to which all the frequencies can be expressed as differences between certain terms, 
the number of terms being much less than the number of frequencies. This law is 
quite unintelligible from the classical standpoint. 


t[‘On a New Law of Series Spectra,’ Ritz, W. The Astrophysical Journal 28 (1908) p. 237 
| https: //ui-adsabs.harvard.edu/abs/1908ApJ....28..237R | doi: 3 10.1086/141591 | 
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One might try to get over the difficulty without departing from classical 
mechanics by assuming each of the spectroscopically observed frequencies to be 
a fundamental frequency with its own degree of freedom, the laws of force being 
such that the harmonic vibrations do not occur. Such a theory will not do, however, 
even apart from the fact that it would give no explanation of the Combination Law, 
since it would immediately bring one into conflict with the experimental evidence 
on specific heats. Classical statistical mechanics enables one to establish a general 
connexion between the total number of degrees of freedom of an assembly of 
vibrating systems and its specific heat. If one assumes all the spectroscopic 
frequencies of an atom to correspond to different degrees of freedom, one would get 
a specific heat for any kind of matter very much greater than the observed value. 
In fact the observed specific heats at ordinary temperatures are given fairly well 
by a theory that takes into account merely the motion of each atom as a whole 
and assigns no internal motion to it at all. 

This leads us to a new clash between classical mechanics and the results 
of experiment. There must certainly be some internal motion in an atom 
to account for its spectrum, but the internal degrees of freedom, for some classically 
inexplicable reason, do not contribute to the specific heat. A similar clash is 
found in connexion with the energy of oscillation of the electromagnetic field 
in a vacuum. Classical mechanics requires the specific heat corresponding to 
this energy to be infinite, but it is observed to be quite finite. A general conclusion 
from experimental results is that oscillations of high frequency do not contribute 
their classical quota to the specific heat. 

As another illustration of the failure of classical mechanics we may consider 
the behaviour of light. We have, on the one hand, the phenomena of interference 
and diffraction, which can be explained only on the basis of a wave theory; 
on the other, phenomena such as photo-electric emission and scattering by free 
electrons, which show that light is composed of small particles. These particles, 
which are called photons, have each a definite energy and momentum, depending 
on the frequency of the light, and appear to have just as real an existence as 
electrons, or any other particles known in physics. A fraction of a photon is 
never observed. 

Experiments have shown that this anomalous behaviour is not peculiar to light, 
but is quite general. All material particles have wave properties, which can be 
exhibited under suitable conditions. We have here a very surprising* and general 
example of the breakdown of classical mechanics—not merely an inaccuracy in its 
laws of motion, but an inadequacy of its concepts to supply us with a description 
of atomic events. 


*l‘surprising’ replaces ‘striking’| 
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The necessity to depart from classical ideas when one wishes to account for 
the ultimate structure of matter may be seen, not only from experimentally 
established facts, but also from general philosophical grounds. In a classical 
explanation of the constitution of matter, one would assume it to be made up 
of a large number of small constituent parts and one would postulate laws for 
the behaviour of these parts, from which the laws of the matter in bulk could 
be deduced. This would not complete the explanation, however, since the question 
of the structure and stability of the constituent parts is left untouched. To go 
into this question, it becomes necessary to postulate that each constituent part 
is itself made up of smaller parts, in terms of which its behaviour is to be 
explained. There is clearly no end to this procedure, so that one can never arrive 
at the ultimate structure of matter on these lines. So long as big and small are 
merely relative concepts, it is no help to explain the big in terms of the small. It is 
therefore necessary to modify classical ideas in such a way as to give an absolute 
meaning to size. 

At this stage it becomes important to remember that science is concerned only 
with observable things and that we can observe an object only by letting it interact 
with some outside influence. An act of observation is thus necessarily accompanied 
by some disturbance of the object observed. We may define an object to be 
big when the disturbance accompanying our observation of it may be neglected, 
and small when the disturbance cannot be neglected. This definition is in close 
agreement with the common meanings of big and small. 

It is usually assumed that, by being careful, we may cut down the disturbance 
accompanying our observation to any desired extent. The concepts of big and small 
are then purely relative and refer to the gentleness of our means of observation 
as well as to the object being described. In order to give an absolute meaning 
to size, such as is required for any theory of the ultimate structure of matter, 
we have to assume that there is a limit to the fineness of our powers of observation 
and the smallness of the accompanying disturbance—a limit which is inherent 
in the nature of things and can never be surpassed by improved technique or 
increased skill on the part of the observer. If the object under observation is 
such that the unavoidable limiting disturbance is negligible, then the object 
is big in the absolute sense and we may apply classical mechanics to it. 
If, on the other hand, the limiting disturbance is not negligible, then the object is 
small in the absolute sense and we require a new theory for dealing with it. 

A consequence of the preceding discussion is that we must revise our ideas of 
causality. Causality applies only to a system which is left undisturbed. If a system 
is small, we cannot observe it without producing a serious disturbance and 
hence we cannot expect to find any causal connexion between the results of our 
observations. Causality will still be assumed to apply to undisturbed systems 


causality 
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and the equations which will be set up to describe an undisturbed system will be 
differential equations expressing a causal connexion between conditions at one time 
and conditions at a later time. These equations will be in close correspondence 
with the equations of classical mechanics, but they will be connected only 
indirectly with the results of observations. There is an unavoidable indeterminacy 
in the calculation of observational results, the theory enabling us to calculate in 
general only the probability of our obtaining a particular result when we make 
an observation. 


2. The polarization of photons 

The discussion in the preceding section about the limit to the gentleness with which 
observations can be made and the consequent indeterminacy in the results of those 
observations does not provide any quantitative basis for the building up of quantum 
mechanics. For this purpose a new set of accurate laws of nature is required. 
One of the most fundamental and most drastic of these is the Principle of 
Superposition of States. We shall lead up to a general formulation of this principle 
through a consideration of some special cases, taking first the example provided 
by the polarization of light. 

It is known experimentally that when plane-polarized light is used for 
ejecting photo-electrons, there is a preferential direction for the electron emission. 
Thus the polarization properties of light are closely connected with its corpuscular 
properties and one must ascribe a polarization to the photons. One must consider, 
for instance, a beam of light plane-polarized in a certain direction as consisting of 
photons each of which is plane-polarized in that direction and a beam of circularly 
polarized light as consisting of photons each circularly polarized. Every photon 
is in a certain state of polarization, as we shall say. The problem we must now 
consider is how to fit in these ideas with the known facts about the resolution of 
light into polarized components and the recombination of these components. 

Let us take a definite case. Suppose we have a beam of light passing through 
a crystal of tourmaline, which has the property of letting through only light 
plane-polarized perpendicular to its optic axis. Classical electrodynamics tells 
us what will happen for any given polarization of the incident beam. If this beam 
is polarized perpendicular to the optic axis, it will all go through the crystal; 
if parallel to the axis, none of it will go through; while if polarized at an angle a to 
the axis, a fraction sin?a will go through. How are we to understand these results 
on a photon basis? 

A beam that is plane-polarized in a certain direction is to be pictured as 
made up of photons each plane-polarized in that direction. This picture leads 
to no difficulty in the cases when our incident beam is polarized perpendicular or 
parallel to the optic axis. We merely have to suppose that each photon polarized 
perpendicular to the axis passes unhindered and unchanged through the crystal, 
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while each photon polarized parallel to the axis is stopped and absorbed. 
A difficulty arises, however, in the case of the obliquely polarized incident beam. 
Each of the incident photons is then obliquely polarized and it is not clear what 
will happen to such a photon when it reaches the tourmaline. 

A question about what will happen to a particular photon under certain 
conditions is not really very precise. To make it precise one must imagine some 
experiment performed having a bearing on the question and inquire what will be 
the result of the experiment. Only questions about the results of experiments 
have a real significance and it is only such questions that theoretical physics has 
to consider. 

In our present example the obvious experiment is to use an incident beam 
consisting of only a single photon and to observe what appears on the back side 
of the crystal. According to quantum mechanics the result of this experiment will 
be that sometimes one will find a whole photon, of energy equal to the energy 
of the incident photon, on the back side and other times one will find nothing. 
When one finds a whole photon, it will be polarized perpendicular to the optic axis. 
One will never find only a part of a photon on the back side. If one repeats 
the experiment a large number of times, one will find the photon on the back side in 
a fraction sin?a of the total number of times. Thus we may say that the photon has 
a probability sin?a of passing through the tourmaline and appearing on the back 
side polarized perpendicular to the axis and a probability cos?a of being absorbed. 
These values for the probabilities lead to the correct classical results for an incident 
beam containing a large number of photons. 

In this way we preserve the individuality of the photon in all cases. 
We are able to do this, however, only because we abandon the determinacy 
of the classical theory. The result of an experiment is not determined, 
as it would be according to classical ideas, by the conditions under the control 
of the experimenter. The most that can be predicted is a set of possible results, 
with a probability of occurrence for each. 

The foregoing discussion about the result of an experiment with a single 
obliquely polarized photon incident on a crystal of tourmaline answers all that 
can legitimately be asked about what happens to an obliquely polarized photon 
when it reaches the tourmaline. Questions about what decides whether the photon 
is to go through or not and how it changes its direction of polarization when 
it does go through cannot be investigated by experiment and should be regarded 
as outside the domain of science, at least for this discussion’ Nevertheless some 
further description is necessary in order to correlate the results of this experiment 
with the results of other experiments that might be performed with photons and 
to fit them all into a general scheme. Such further description should be regarded, 


t[The limitation to this discussion has been added] 
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not as an attempt to answer questions outside the domain of science, but as an aid 
to the formulation of rules for expressing concisely the results of large numbers 
of experiments. 

The further description provided by quantum mechanics runs as follows. It is 
supposed that a photon polarized obliquely to the optic axis may be regarded 
as being partly in the state of polarization parallel to the axis and partly 
in the state of polarization perpendicular to the axis. The state of oblique 
polarization may be considered as the result of some kind of superposition process 
applied to the two states of parallel and perpendicular polarization. This implies 
a certain special kind of relationship between the various states of polarization, 
a relationship similar to that between polarized beams in classical optics, but which 
is now to be applied, not to beams, but to the states of polarization of one 
particular photon. This relationship allows any state of polarization to be 
resolved into, or expressed as a superposition of, any two mutually perpendicular 
states of polarization. 

When we make the photon meet a tourmaline crystal, we are subjecting it to 
an observation. We are observing whether it is polarized parallel or perpendicular 
to the optic axis. The effect of making this observation is to force the photon 
entirely into the state of parallel or entirely into the state of perpendicular 
polarization. It has to make a sudden jump from being partly in each of these 
two states to being entirely in one or other of them. Which of the two states it will 
jump into cannot be predicted, but is governed only by probability laws. If it jumps 
into the parallel state it gets absorbed and if it jumps into the perpendicular state 
it passes through the crystal and appears on the other side preserving this state 
of polarization. 


3. Interference of photons 

In this section we shall deal with another example of superposition. We shall 
again take photons, but shall be concerned with their position in space and 
their momentum instead of their polarization. If we are given a beam of roughly 
monochromatic light, then we know something about the location and momentum 
of the associated photons. We know that each of them is located somewhere in 
the region of space through which the beam is passing and has a momentum in 
the direction of the beam of magnitude given in terms of the frequency of the beam 


photo- by Albert Einstein’s photo-electric law—momentum equals frequency multiplied 
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by a universal constant** When we have such information about the location and 
momentum of a photon we shall say that it is in a definite translational state. 

We shall discuss the description which quantum mechanics provides of 
the interference of photons. Let us take a definite experiment demonstrating 
interference. Suppose we have a beam of light which is passed through 
some kind of interferometer, so that it gets split up into two components 
and the two components are subsequently made to interfere. We may, as in 
the preceding section, take an incident beam consisting of only a single photon 
and inquire what will happen to it as it goes through the apparatus. This will 
present to us the difficulty of the conflict between the wave and corpuscular theories 
of light in an acute form. 

Corresponding to the description that we had in the case of the polarization, 
we must now describe the photon as going partly into each of the two components 
into which the incident beam is split. The photon is then, as we may say, 
in a translational state given by the superposition of the two translational states 
associated with the two components. We are thus led to a generalization of 
the term ‘translational state’ applied to a photon. For a photon to be in a definite 
translational state it need not be associated with one single beam of light, but may 
be associated with two or more beams of light which are the components into 
which one original beam has been split? In the accurate mathematical theory 
each translational state is associated with one of the wave functions of ordinary 
wave optics, which wave function may describe either a single beam or two or more 
beams into which one original beam has been split. Translational states are thus 
superposable in a similar way to wave functions. 

Let us consider now what happens when we determine the energy in one of 
the components. The result of such a determination must be either the whole 
photon or nothing at all. Thus the photon must change suddenly from being 
partly in one beam and partly in the other to being entirely in one of the beams. 
This sudden change is due to the disturbance in the translational state of 
the photon which the observation necessarily makes. It is impossible to predict 
in which of the two beams the photon will be found. Only the probability of 


*[Einstein, A. (1905). »Uber einen die Erzeugung und Verwandlung des 
Lichtes betreffenden heuristischen Gesichtspunkt“ Annalen Der Physik, 322(6), 
pp. 132-148  doi:10.1002 /andp.19053220607 English Translation: ‘On a Heuristic 


Point of View about the Creation and Conversion of Light’ by Wikisource 
{ https: //en.wikisource.org/?curid=59468 }] 

tThe circumstance that the superposition idea requires us to generalize our original 
meaning of translational states, but that no corresponding generalization was needed for 
the states of polarization of the preceding section, is an accidental one with no underlying 
theoretical significance. 
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either result can be calculated from the previous distribution of the photon over 
the two beams. 

One could carry out the energy measurement without destroying the component 
beam by, for example, reflecting the beam from a movable mirror and observing 
the recoil. Our description of the photon allows us to infer that, after such 
an energy measurement, it would not be possible to bring about any interference 
effects between the two components. So long as the photon is partly in one 
beam and partly in the other, interference can occur when the two beams are 
superposed, but this possibility disappears when the photon is forced entirely into 
one of the beams by an observation. The other beam then no longer enters into 
the description of the photon, so that it counts as being entirely in the one beam 
in the ordinary way for any experiment that may subsequently be performed on it. 

On these lines quantum mechanics is able to effect a reconciliation of the wave 
and corpuscular properties of light. The essential point is the association of each 
of the translational states of a photon with one of the wave functions of ordinary 
wave optics. The nature of this association cannot be pictured on a basis of classical 
mechanics, but is something entirely new. It would be quite wrong to picture 
the photon and its associated wave as interacting in the way in which particles 
and waves can interact in classical mechanics. The association can be interpreted 
only statistically, the wave function giving us information about the probability 
of our finding the photon in any particular place when we make an observation of 
where it is. 

Some time before the discovery of quantum mechanics people realized that 
the connexion between light waves and photons must be of a statistical character. 
What they did not clearly realize, however, was that the wave function gives 
information about the probability of one photon being in a particular place and not 
the probable number of photons in that place. The importance of the distinction 
can be made clear in the following way. Suppose we have a beam of light consisting 
of a large number of photons split up into two components of equal intensity. 
On the assumption that the intensity of a beam is connected with the probable 
number of photons in it, we should have half the total number of photons going 
into each component. If the two components are now made to interfere, we should 
require a photon in one component to be able to interfere with one in the other. 
Sometimes these two photons would have to annihilate one another and other times 
they would have to produce four photons. This would contradict the conservation 
of energy. The new theory, which connects the wave function with probabilities 
for one photon, gets over the difficulty by making each photon go partly into each 
of the two components. Each photon then interferes only with itself. Interference 
between two different photons never occurs. 


The association of particles with waves discussed above is not restricted to 
the case of light, but is, according to modern theory, of universal applicability. 
All kinds of particles are associated with waves in this way and conversely all wave 
motion is associated with particles. Thus all particles can be made to exhibit 
interference effects and all wave motion has its energy in the form of quanta. 
The reason why these general phenomena are not more obvious is on account 
of a law of proportionality between the mass or energy of the particles and 
the frequency of the waves, the coefficient being such that for waves of familiar 
frequencies the associated quanta are extremely small, while for particles even 
as light as electrons the associated wave frequency is so high that it is not easy 
to demonstrate interference. 


4. Superposition and indeterminacy 

The reader may possibly feel dissatisfied with the attempt in the two preceding 
sections to fit in the existence of photons with the classical theory of light. 
It may be argued! that a very strange idea has been introduced—the possibility 
of a photon being partly in each of two states of polarization, or partly in each 
of two separate beams—but even with the help of this strange idea no satisfying 
picture of the fundamental single-photon processes has been given. One may say? 
further that this strange idea did not provide any information about experimental 
results for the experiments discussed, beyond what could have been obtained from 
an elementary consideration of photons being guided in some vague way by waves. 
What, then, is the use of the strange idea? 

In answer to the first criticism it may be remarked that the main object 
of physical science is not the provision of pictures, but is the formulation of 
laws governing phenomena and the application of these laws to the discovery 
of new phenomena. If a picture exists, so much the better; but whether 
a picture exists or not is a matter of only secondary importance. In the case 
of atomic phenomena no picture can be expected to exist in the usual sense 
of the word ‘picture’, by which is meant a model functioning essentially on 
classical lines. One may, however, extend the meaning of the word ‘picture’ 
to include any way of looking at the fundamental laws which makes their 
self-consistency obvious. With this extension, one may gradually acquire a picture 
of atomic phenomena by becoming familiar with the laws of the quantum theory. 

With regard to the second criticism, it may be remarked that for many simple 
experiments with light, an elementary theory of waves and photons connected in 
a vague statistical way would be adequate to account for the results. In the case 
of such experiments quantum mechanics has no further information to give. 


t[Original:- He may argue] 
{Original:- He may say] 
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In the great majority of experiments, however, the conditions are too complex for 
an elementary theory of this kind to be applicable and some more elaborate scheme, 
such as is provided by quantum mechanics, is then needed. The method of 
description that quantum mechanics gives in the more complex cases is applicable 
also to the simple cases and although it is then not really necessary for accounting 
for the experimental results, its study in these simple cases is perhaps a suitable 
introduction to its study in the general case. 

There remains an overall criticism that one may make to the whole scheme, 
namely, that in departing from the determinacy of the classical theory a great 
complication is introduced into the description of Nature, which is a highly 
undesirable feature. This complication is undeniable, but it is offset by a great 
simplification, provided by the general principle of superposition of states, 
which we shall now go on to consider. But first it is necessary to make precise 
the important concept of a ‘state’ of a general atomic system. 

Let us take any atomic system, composed of particles or bodies with specified 
properties (mass, moment of inertia, etc.) interacting according to specified laws 
of force. There will be various possible motions of the particles or bodies consistent 
with the laws of force. Each such motion is called a state of the system. 
According to classical ideas one could specify a state by giving numerical values 
to all the coordinates and velocities of the various component parts of the system 
at some instant of time, the whole motion being then completely determined. 
Now the argument about the disturbance of observation* shows that we cannot 
observe a small system with that amount of detail which classical theory supposes. 
The limitation in the power of observation puts a limitation on the number of 
data that can be assigned to a state. Thus a state of an atomic system must be 
specified by fewer or more indefinite data than a complete set of numerical values 
for all the coordinates and velocities at some instant of time. In the case when 
the system is just a single photon, a state would be completely specified by a given 
state of motion in the sense of §3 together with a given state of polarization in 
the sense of §2. 

A state of a system may be defined as an undisturbed motion that is restricted 
by as many conditions or data as are theoretically possible without mutual 
interference or contradiction. In practice the conditions could be imposed by 
a suitable preparation of the system, consisting perhaps in passing it through 
various kinds of sorting apparatus, such as slits and polarimeters, the system being 
left undisturbed after the preparation. The word ‘state’ may be used to mean either 
the state at one particular time (after the preparation), or the state throughout 
the whole of time after the preparation. To distinguish these two meanings, 
the latter will be called a ‘state of motion’ when there is liable to be ambiguity. 


“lof pp. 2 and 3] 
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The general principle of superposition of quantum mechanics applies to 
the states, with either of the above meanings, of any one dynamical system. 
It requires us to assume that between these states there exist peculiar relationships 
such that whenever the system is definitely in one state we can consider it as being 
partly in each of two or more other states. The original state must be regarded as 
the result of a kind of superposition of the two or more new states, in a way that 
cannot be conceived on classical ideas. Any state may be considered as the result 
of a superposition of two or more other states, and indeed in an infinite number 
of ways. Conversely any two or more states may be superposed to give a new state. 
The procedure of expressing a state as the result of superposition of a number of 
other states is a mathematical procedure that is always permissible, independent 
of any reference to physical conditions, like the procedure of resolving a wave 
into Fourier components. Whether it is useful in any particular case, though, 
depends on the special physical conditions of the problem under consideration. 

In the two preceding sections examples were given of the superposition principle 
applied to a system consisting of a single photon. §2 dealt with states differing 
only with regard to the polarization and §3 with states differing only with regard 
to the motion of the photon as a whole. 

The nature of the relationships which the superposition principle requires 
to exist between the states of any system is of a kind that cannot be explained 
in terms of familiar physical concepts. One cannot in the classical sense picture 
a system being partly in each of two states and see the equivalence of this to 
the system being completely in some other state. There is an entirely new idea 
involved, to which one must get accustomed and in terms of which one must 
proceed to build up an exact mathematical theory, without having any detailed 
classical picture. 

When a state is formed by the superposition of two other states, it will 
have properties that are in some vague way intermediate between those of 
the two original states and that approach more or less closely to those of 
either of them according to the greater or less ‘weight’ attached to this state 
in the superposition process. The new state is completely defined by the two 
original states when their relative weights in the superposition process are known, 
together with a certain phase difference, the exact meaning of weights and phases 
being provided in the general case by the mathematical theory. In the case of 
the polarization of a photon their meaning is that provided by classical optics, 
so that, for example, when two perpendicularly plane polarized states are 
superposed with equal weights, the new state may be circularly polarized in 
either direction, or linearly polarized at an angle 47, or else elliptically polarized, 
according to the phase difference. 


superpostion of 
states 


wave mechanics 


12 


The non-classical nature of the superposition process is brought out clearly 
if we consider the superposition of two states, A and B, such that there exists 
an observation which, when made on the system in state A, is certain to lead to 
one particular result, a@ say, and when made on the system in state B is certain 
to lead to some different result, b say. What will be the result of the observation 
when made on the system in the superposed state? The answer is that the result 
will be sometimes a and sometimes b, according to a probability law depending 
on the relative weights of A and B in the superposition process. It will never 
be different from both a and b. The intermediate character of the state formed 
by superposition thus expresses itself through the probability of a particular result 
for an observation being intermediate between the corresponding probabilities for 
the original states} not through the result itself being intermediate between the 
corresponding results for the original states. 

In this way we see that such a drastic departure from ordinary ideas as 
the assumption of superposition relationships between the states is possible 
only on account of the recognition of the importance of the disturbance 
accompanying an observation and of the consequent indeterminacy in the result 
of the observation. When an observation is made on any atomic system that is in 
a given state, in general the result will not be determinate, i.e., if the experiment 
is repeated several times under identical conditions several different results may 
be obtained. It is a law of nature, though, that if the experiment is repeated 
a large number of times, each particular result will be obtained in a definite 
fraction of the total number of times, so that there is a definite probability of 
its being obtained. This probability is what the theory sets out to calculate. 
Only in special cases when the probability for some result is unity is the result of 
the experiment determinate. 

The assumption of superposition relationships between the states leads to 
a mathematical theory in which the equations that define a state are linear in 
the unknowns. In consequence of this, people have tried to establish analogies 
with systems in classical mechanics, such as vibrating strings or membranes, 
which are governed by linear equations and for which, therefore, a superposition 
principle holds. Such analogies have led to the name ‘Wave Mechanics’ being 
sometimes given to quantum mechanics. It is important to remember, however, 
that the superposition that occurs in quantum mechanics is of an essentially 
different nature from any occurring in the classical theory, as is shown by the fact 
that the quantum superposition principle demands indeterminacy in the results 


'The probability of a particular result for the state formed by superposition is not always 
intermediate between those for the original states in the general case when those for the original 
states are not zero or unity, so there are restrictions on the ‘intermediateness’ of a state formed 
by superposition. 
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of observations in order to be capable of a sensible physical interpretation. 
The analogies are thus liable to be misleading. 


5. Mathematical formulation of the principle 

A profound change has taken place during the Twentieth Century in the opinions 
physicists have held on the mathematical foundations of their subject. 
Previously they supposed that the principles of Newtonian mechanics would 
provide the basis for the description of the whole of physical phenomena and 
that all the theoretical physicist had to do was suitably to develop and apply 
these principles. With the recognition that there is no logical reason why 
Newtonian and other classical principles should be valid outside the domains 
in which they have been experimentally verified has come the realization that 
departures from these principles are indeed necessary. Such departures find their 
expression through the introduction of new mathematical formalisms, new schemes 
of axioms and rules of manipulation, into the methods of theoretical physics. 

Quantum mechanics provides a good example of the new ideas. It requires 
the states of a dynamical system and the dynamical variables to be interconnected 
in quite strange ways that are unintelligible from the classical standpoint. 
The states and dynamical variables have to be represented by mathematical 
quantities of different natures from those ordinarily used in physics. The new 
scheme becomes a precise physical theory when all the axioms and rules 
of manipulation governing the mathematical quantities are specified and 
when in addition certain laws are laid down connecting physical facts with 
the mathematical formalism, so that from any given physical conditions 
equations between the mathematical quantities may be inferred and vice versa. 
In an application of the theory one would be given certain physical 
information, which one would proceed to express by equations between 
the mathematical quantities. One would then deduce new equations with the help 
of the axioms and rules of manipulation and would conclude by interpreting 
these new equations as physical conditions. The justification for the whole 
scheme depends, apart from internal consistency, on the agreement of the final 
results with experiment. 

We shall begin to set up the scheme by dealing with the mathematical relations 
between the states of a dynamical system at one instant of time, which relations 
will come from the mathematical formulation of the principle of superposition. 
The superposition process is a kind of additive process and implies that states can 
in some way be added to give new states. The states must therefore be connected 
with mathematical quantities of a kind which can be added together to give other 
quantities of the same kind. The most obvious of such quantities are vectors. 
Ordinary vectors, existing in a space of a finite number of dimensions, are not 
sufficiently general for most of the dynamical systems in quantum mechanics. 
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We have to make a generalization to vectors in a space of an infinite number 
of dimensions, and the mathematical treatment becomes complicated by questions 
of convergence. For the present, however, we shall deal merely with some general 
properties of the vectors, properties which can be deduced on the basis of a simple 
scheme of axioms, and questions of convergence and related topics will not be gone 
into until the need arises. 

It is desirable to have a special name for describing the vectors which are 
connected with the states of a system in quantum mechanics, whether they are 
in a space of a finite or an infinite number of dimensions. We shall call them 
ket vectors, or simply kets, and denote a general one of them by a special symbol |). 
If we want to specify a particular one of them by a label, A say, we insert it 
in the middle, thus |A). The suitability of this notation will become clear as 
the scheme is developed. 

Ket vectors may be multiplied by complex numbers and may be added together 
to give other ket vectors, e.g. from two ket vectors |A) and |B) we can form 

1 |A) + ¢2|B) = |R) (1) 
say, where c,; and cy are any two complex numbers. We may also perform more 
general linear processes with them, such as adding an infinite sequence of them, 
and if we have a ket vector |”), depending on and labelled by a parameter x which 
can take on all values in a certain range, we may integrate it with respect to 2, 


to get another ket vector 
: | ax =|) 


say. A ket vector which is expressible linearly in terms of certain others is said 
to be dependent on them. A set of ket vectors are called independent if no one of 
them is expressible linearly in terms of the others. 

We now assume that each state of a dynamical system at a particular time 
corresponds to a ket vector, the correspondence being such that if a state results 
from the superposition of certain other states, its corresponding ket vector 1s 
expressible linearly in terms of the corresponding ket vectors of the other states, 
and conversely. Thus the state R results from the superposition of the states A 
and B when the corresponding ket vectors are connected by (1). 

The above assumption leads to certain properties of the superposition process, 
properties which are in fact necessary for the word ‘superposition’ to be 
appropriate. When two or more states are superposed, the order in which 
they occur in the superposition process is unimportant, so the superposition 
process is symmetrical between the states that are superposed. Again, we see from 
equation (1) that (excluding the case when the coefficient c, or cz is zero) if the state 
R can be formed by superposition of the states A and B, then the state A can 
be formed by superposition of B and R, and B can be formed by superposition of 
A and R. The superposition relationship is symmetrical between all three states 
A, B and R. 
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A state which results from the superposition of certain other states will be 
said to be dependent on those states. More generally, a state will be said to be 
dependent on any set of states, finite or infinite in number, if its corresponding 
ket vector is dependent on the corresponding ket vectors of the set of states. A set 
of states will be called independent if no one of them is dependent on the others. 

To proceed with the mathematical formulation of the superposition principle 
we must introduce a further assumption, namely the assumption that by 
superposing a state with itself we cannot form any new state, but only the original 
state over again. If the original state corresponds to the ket vector |A), when it is 
superposed with itself the resulting state will correspond to 

c,|A) + c2|A) = (c1 + c2) JA), 

where c; and cy are numbers. Now we may have c; + co = 0, in which case 
the result of the superposition process would be nothing at all, the two components 
having cancelled each other by an interference effect. Our new assumption 
requires that, apart from this special case, the resulting state must be the same 
as the original one, so that (c, + cy) |A) must correspond to the same state that 
|A) does. Now c; +g, is an arbitrary complex number and hence we can conclude 
that if the ket vector corresponding to a state is multiplied by any complex number, 
not zero, the resulting ket vector will correspond to the same state. Thus a state 
is specified by the direction of a ket vector and any length one may assign to 
the ket vector is irrelevant. All the states of the dynamical system are in one-one 
correspondence with all the possible directions for a ket vector, no distinction being 
made between the directions of the ket vectors |A) and —|A). 

The assumption just made shows up very clearly the fundamental difference 
between the superposition of the quantum theory and any kind of classical 
superposition. In the case of a classical system for which a superposition 
principle holds, for instance a vibrating membrane, when one superposes 
a state with itself the result is a different state, with a different magnitude 
of the oscillations. There is no physical characteristic of a quantum state 
corresponding to the magnitude of the classical oscillations, as distinct from 
their quality, described by the ratios of the amplitudes at different points of 
the membrane. Again, while there exists a classical state with zero amplitude 
of oscillation everywhere, namely the state of rest, there does not exist any 
corresponding state for a quantum system, the zero ket vector corresponding to 
no state at all. 

Given two states corresponding to the ket vectors |A) and |B), the general state 
formed by superposing them corresponds to a ket vector |R) which is determined 
by two complex numbers, namely the coefficients c, and cz of equation (1). If these 
two coefficients are multiplied by the same factor (itself a complex number), the ket 
vector |R) will get multiplied by this factor and the corresponding state will be 
unaltered. Thus only the ratio of the two coefficients is effective in determining 
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the state R. Hence this state is determined by one complex number, or by two 
real parameters. Thus from two given states, a twofold infinity of states may be 
obtained by superposition. 

This result is confirmed by the examples discussed in §§2 and 3. In the example 
of §2 there are just two independent states of polarization for a photon, which may 
be taken to be the states of plane polarization parallel and perpendicular to 
some fixed direction, and from the superposition of these two a twofold infinity 
of states of polarization can be obtained, namely all the states of elliptic 
polarization, the general one of which requires two parameters to describe it. 
Again, in the example of §3, from the superposition of two given states of 
motion for a photon a twofold infinity of states of motion may be obtained, 
the general one of which is described by two parameters, which may be taken to be 
the ratio of the amplitudes of the two wave functions that are added together and 
their phase relationship. This confirmation shows the need for allowing complex 
coefficients in equation (1). If these coefficients were restricted to be real, then, 
since only their ratio is of importance for determining the direction of the resultant 
ket vector |R) when |A) and |B) are given, there would be only a simple infinity 
of states obtainable from the superposition. 


6. Bra and ket vectors 

Whenever we have a set of vectors in any mathematical theory, we can always set up 
a second set of vectors, which mathematicians call the dual vectors. The procedure 
will be described for the case when the original vectors are our ket vectors. 

Suppose we have a number @ which is a function of a ket vector |A), 
i.e. to each ket vector |A) there corresponds one number ¢, and suppose further 
that the function is a linear one, which means that the number corresponding 
to |A) + |A’) is the sum of the numbers corresponding to |A) and to |A’), 
and the number corresponding to c|A) is c times the number corresponding to |A), 
c being any numerical factor. Then the number ¢ corresponding to any |A) may be 
looked upon as the scalar product of that |A) with some new vector, there being one 
of these new vectors for each linear function of the ket vectors |A). The justification 
for this way of looking at @ is that, as will be seen later (see equations (5) 
and (6)), the new vectors may be added together and may be multiplied by numbers 
to give other vectors of the same kind. The new vectors are, of course, defined 
only to the extent that their scalar products with the original ket vectors are 
given numbers, but this is sufficient for one to be able to build up a mathematical 
theory about them. 

We shall call the new vectors bra vectors, or simply bras, and denote a general 
one of them by the symbol (|, the mirror image of the symbol for a ket vector. 
If we want to specify a particular one of them by a label, B say, we write it in 
the middle, thus (B|. The scalar product of a bra vector (B| and a ket vector 
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|A) will be written (B|A), i.e. as a juxtaposition of the symbols for the bra and 
ket vectors, that for the bra vector being on the left, and the two vertical lines 
being contracted to one for brevity. 

One may look upon the symbols ( and ) as a distinctive kind of brackets. 
A scalar product (B|A) now appears as a complete bracket expression and 
a bra vector (B| or a ket vector |A) as an incomplete bracket expression. 
We have the rules that any complete bracket expression denotes a number and any 
incomplete bracket expression denotes a vector, of the bra or ket kind according to 
whether it contains the first or second part of the brackets. 

The condition that the scalar product of (B] and |A) is a linear function of |A) 
may be expressed symbolically by (B| {|A) + |A’)} = (B|A) + (B|A’), (2) 

| (Bl {c|A)} = ¢ (BA), (3) 
c being any number. 

A bra vector is considered to be completely defined when its scalar product 
with every ket vector is given, so that if a bra vector has its scalar product with 
every ket vector vanishing, the bra vector itself must be considered as vanishing. 
In symbols, if (P| A) =0, for all |A), 4 
then CP, (4) 
The sum of two bra vectors (B| and (B’| is defined by the condition that its scalar 
product with any ket vector |A) is the sum of the scalar products of (B| and (B’| 
with |A), {(B| + (B']}|A) = (B| A) + (B’| A), (5) 
and the product of a bra vector (B| and a number c is defined by the condition 
that its scalar product with any ket vector |A) is c times the scalar product of (B| 
with |A), {c(B|}|A) = c(B|A). (6) 
Equations (2) and (5) show that products of bra and ket vectors satisfy 
the distributive axiom of multiplication, and equations (3) and (6) show that 
multiplication by numerical factors satisfies the usual algebraic axioms. 


The bra vectors, as they have been here introduced, are quite a different kind 
of vector from the kets, and so far there is no connexion between them except for 
the existence of a scalar product of a bra and a ket. We now make the assumption 
that there is a one-one correspondence between the bras and the kets, such that 
the bra corresponding to |A) + |A’) is the sum of the bras corresponding to |A) 
and to |A’), and the bra corresponding to c|A) is € times the bra corresponding 
to |A), € being the conjugate complex number to c. We shall use the same label 
to specify a ket and the corresponding bra. Thus the bra corresponding to |A) will 
be written (Al. 


The relationship between a ket vector and the corresponding bra makes 
it reasonable to call one of them the conjugate imaginary of the other. Our bra 
and ket vectors are complex quantities, since they can be multiplied by complex 
numbers and are then of the same nature as before, but they are complex quantities 
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of a special kind which cannot be split up into real and imaginary parts! The usual 
method of getting the real part of a complex quantity, by taking half the sum of 
the quantity itself and its conjugate, cannot be applied since a bra and a ket vector 
are of different natures and cannot be added together. To call attention to this 
distinction, we shall use the words ‘conjugate complex’ to refer to numbers and 
other complex quantities which can be split up into real and‘ imaginary parts, 
and the words ‘conjugate imaginary’ for bra and ket vectors, which cannot. 
With the former kind of quantity, we shall use the notation of putting a bar 
over one of them to get the conjugate complex one. 

On account of the one-one correspondence between bra vectors and ket vectors, 
any state of our dynamical system at a particular time may be specified by 
the direction of a bra vector just as well as by the direction of a ket vector. In fact 
the whole theory will be symmetrical in its essentials between bras and kets. 

Given any two ket vectors |A) and |B), we can construct from them a number 
(B|A) by taking the scalar product of the first with the conjugate imaginary 
of the second. This number depends linearly on |A) and antilinearly on |B), 
the antilinear dependence meaning that the number formed from |B) + |B’) is 
the sum of the numbers formed from |B) and from |B’), and the number formed 
from c|B) is € times the number formed from |B). There is a second way in which 
we can construct a number which depends linearly on |A) and antilinearly on |B), 
namely by forming the scalar product of |B) with the conjugate imaginary of | A) 
and taking the conjugate complex of this scalar product. We assume that these 


two numbers are always equal, i.e. (B|A) = (AB). (7) 
Putting |B) = |A) here, we find that the number (A|A) must be real. We make 
the further assumption (A|A) > 0, (8) 


except when |A) = 0. 

In ordinary space, from any two vectors one can construct a number—their 
scalar product—which is a real number and is symmetrical between them. 
In the space of bra vectors or the space of ket vectors, from any two vectors one can 
again construct a number—the scalar product of one with the conjugate imaginary 
of the other—but this number is complex and goes over into the conjugate 
complex number when the two vectors are interchanged. There is thus a kind of 
perpendicularity in these spaces, which is a generalization of the perpendicularity 
in ordinary space. We shall call a bra and a ket vector orthogonal if their 
scalar product is zero, and two bras or two kets will be called orthogonal 
if the scalar product of one with the conjugate imaginary of the other is zero. 
Further we shall say that two states of our dynamical system are orthogonal 
if the vectors corresponding to these states are orthogonal. 


t[pure’ is omitted.] 
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The length of a bra vector (A| or of the conjugate imaginary ket vector |A) 
is defined as the square root of the positive number (A|A). When we are given 
a state and wish to set up a bra or ket vector to correspond to it, only the direction 
of the vector is given and the vector itself is undetermined to the extent of 
an arbitrary numerical factor. It is often convenient to choose this numerical 
factor so that the vector is of length unity. This procedure is called normalization 
and the vector so chosen is said to be normalized. The vector is not completely 
determined even then, since one can still multiply it by any number of modulus 
unity, i.e. any number e”’ where ¥ is real, without changing its length. We shall 
call such a number a phase factor. 

The foregoing assumptions give the complete scheme of relations between 
the states of a dynamical system at a particular time. The relations appear 
in mathematical form, but they imply physical conditions, which will lead to 
results expressible in terms of observations when the theory is developed further. 
For instance, if two states are orthogonal, it means at present simply a certain 
equation in our formalism, but this equation implies a definite physical relationship 
between the states, which further developments of the theory will enable us 
to interpret in terms of observational results (see the paragraphs on measurement 
and orthogonality of states"). 
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7. Linear operators 

IN the preceding section we considered a number which is a linear function of 
a ket vector, and this led to the concept of a bra vector. We shall now consider 
a ket vector which is a linear function of a ket vector, and this will lead to 
the concept of a linear operator. 

Suppose we have a ket |F’) which is a function of a ket |A), i.e. to each ket |A) 
there corresponds one ket |F'), and suppose further that the function is a linear one, 
which means that the |F’) corresponding to |A) + |A’) is the sum of the |F’)’s 
corresponding to |A) and to |A’), and the |F’) corresponding to c|A) is c times 
the |F’) corresponding to |A), c being any numerical factor. Under these conditions, 
we may look upon the passage from | A) to |F’) as the application of a linear operator 
to |A). Introducing the symbol a for the linear operator, we may write 

IF) = aA), 
in which the result of a operating on |A) is written like a product of a with |A). 
We make the rule that in such products the ket vector must always be put on 
the right of the linear operator. ‘The above conditions of linearity may now be 
expressed by the equations a{|A)+]|A’)} =a |A)+a|A’), , 
a{c|A)} = ca |A). (1) 

A linear operator is considered to be completely defined when the result of 
its application to every ket vector is given. Thus a linear operator is to be 
considered zero if the result of its application to every ket vanishes, and two linear 
operators are to be considered equal if they produce the same result when applied 
to every ket. 

Linear operators can be added together, the sum of two linear operators being 
defined to be that linear operator which, operating on any ket, produces the sum of 
what the two linear operators separately would produce. Thus a+ £ is defined by 

{a + B}|A) = a|A) + BA) (2) 
for any |A). Equation (2) and the first of equations (1) show that products of 
linear operators with ket vectors satisfy the distributive axiom of multiplication. 

Linear operators can also be multiplied together, the product of two linear 
operators being defined as that linear operator, the application of which to any ket 
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produces the same result as the application of the two linear operators successively. 
Thus the product af is defined as the linear operator which, operating on any 
ket |A), changes it into that ket which one would get by operating first on |A) 
with 6, and then on the result of the first operation with a. In symbols 
{a8} |A) = af |A)}. 

This definition appears as the associative axiom of multiplication for the triple 
product of a, 6, and |A), and allows us to write this triple product as a |A) 
without brackets. However, this triple product is in general not the same as 
what we should get if we operated on |A) first with a and then with £, i.e. in 
general a(|A) differs from Ga|A), so that in general a6 must differ from Ga. 
The commutative axiom of multiplication does not hold for linear operators. It may 
happen as a special case that two linear operators € and 7 are such that €7 and n& 
are equal. In this case we say that € commutes with 7, or that € and n commute. 

By repeated applications of the above processes of adding and multiplying 
linear operators, one can form sums and products of more than two of them, 
and one can proceed to build up an algebra with them. In this algebra 
the commutative axiom of multiplication does not hold, and also the product of 
two linear operators may vanish without either factor vanishing. But all the other 
axioms of ordinary algebra, including the associative and distributive axioms of 
multiplication, are valid, as may easily be verified. 

If we take a number & and multiply it into ket vectors, it appears as 
a linear operator operating on ket vectors, the conditions (1) being fulfilled 
with k substituted for a. A number is thus a special case of a linear operator. 
It has the property that it commutes with all linear operators and this property 
distinguishes it from a general linear operator. 

So far we have considered linear operators operating only on ket vectors. 
We can give a meaning to their operating also on bra vectors, in the following way. 
Take the scalar product of any bra (B| with the ket a|A). This scalar product is 
a number which depends linearly on |A) and therefore, from the definition of bras, 
it may be considered as the scalar product of |A) with some bra. The bra thus 
defined depends linearly on (B|, so we may look upon it as the result of some linear 
operator applied to (B|. This linear operator is uniquely determined by the original 
linear operator a and may reasonably be called the same linear operator operating 
on a bra. In this way our linear operators are made capable of operating on 
bra vectors. 

A suitable notation to use for the resulting bra when a operates on the bra (B| 
is (B| qa, as in this notation the equation which defines (B| a is 

{(Bl a} |A) = (BI {a|A)} (3) 
for any |A), which simply expresses the associative axiom of multiplication for 
the triple product of (B], a and |A). We therefore make the general rule that 
in a product of a bra and a linear operator, the bra must always be put on 
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the left. We can now write the triple product of (B|, a and |A) simply as 
(B|a|A) without brackets. It may easily be verified that the distributive axiom 
of multiplication holds for products of bras and linear operators just as well as for 
products of linear operators and kets. 

There is one further kind of product which has a meaning in our scheme, 
namely the product of a ket vector and a bra vector with the ket on the left, 
such as |A)(B|. To examine this product, let us multiply it into an arbitrary 
ket |P), putting the ket on the right, and assume the associative axiom of 
multiplication. The product is then |A)(B|P), which is another ket, namely |A) 
multiplied by the number (B|P), and this ket depends linearly on the ket |P). 
Thus |A)(B| appears as a linear operator that can operate on kets. It can also 
operate on bras, its product with a bra (Q| on the left being (Q | A)(B|, which is 
the number (Q|A) times the bra (B|. The product |A)(B| is to be* clearly 
distinguished from the product (B|A) of the same factors in the reverse order, 
the latter product being, of course, a number. 

We now have a complete algebraic scheme involving three kinds of quantities, 
bra vectors, ket vectors and linear operators. They can be multiplied together in 
the various ways discussed above, and the associative and distributive axioms 
of multiplication always hold, but the commutative axiom of multiplication 
does not hold. In this general scheme we still have the rules of notation of 
the preceding section, that any complete bracket expression, containing ( on 
the left, and ) on the right, denotes a number, while any incomplete bracket 
expression, containing only ( or ), denotes a vector. 

With regard to the physical significance of the scheme, we have already 
assumed that the bra vectors and ket vectors, or rather the directions of 
these vectors, correspond to the states of a dynamical system at a particular time. 
We now make the further assumption that the linear operators correspond to 
the dynamical variables at that time. By dynamical variables are meant quantities 
such as the coordinates and the components of velocity, momentum and angular 
momentum of particles, and functions of these quantities—in fact the variables in 
terms of which classical mechanics is built up. The new assumption requires that 
these quantities shall occur also in quantum mechanics, but with the surprising 
difference that they are now subject to an algebra in which the commutative axiom 
of multiplication does not hold. 

This different algebra for the dynamical variables is one of the most important 
ways in which quantum mechanics differs from classical mechanics. We shall see 
later on that, in spite of this fundamental difference, the dynamical variables of 
quantum mechanics still have many properties in common with their classical 


*l‘clearly’ substitutes for ‘sharply’ | 
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counterparts and it will be possible to build up a theory of them closely analogous 
to the classical theory and forming a beautiful generalization of it. 

It is convenient to use the same letter to denote a dynamical variable and 
the corresponding linear operator. In fact, we may consider a dynamical variable 
and the corresponding linear operator to be both the same thing, without getting 
into confusion. 


8. Conjugate relations 
Our linear operators are complex quantities, since one can multiply them by 
complex numbers and get other quantities of the same nature. Hence they must 
correspond in general to complex dynamical variables, i.e. to complex functions of 
the coordinates, velocities, etc. We need some further development of the theory 
to see what kind of linear operator corresponds to a real dynamical variable. 

Consider the ket which is the conjugate imaginary of (P| a. This ket depends 
antilinearly on (P| and thus depends linearly on |P). It may therefore be 
considered as the result of some linear operator operating on |P). This linear 
operator is called the adjoint of a and we shall denote it by a. With this notation, 
the conjugate imaginary of (P|a is @|P). 

In formula (7) of Chapter I put (P| a for (A| and its conjugate imaginary @ | P) 
for |A). The result is (B\a|P) = (Pl a|B). (4) 
This is a general formula holding for any ket vectors |B), |P) and any linear 
operator a, and it expresses one of the most frequently used properties of 
the adjoint. 

Putting @ for a in (4), we get (B\a@|P) = (P|@|B) = (Bla|P), 
by using (4) again with |P) and |B) interchanged. This holds for any ket |P), 
so we can infer from (7) of Chapter I, (B| a = (Bla, 
and since this holds for any bra vector (B|, we can infer @ =a. 
Thus the adjoint of the adjoint of a linear operator is the original linear operator. 
This property of the adjoint makes it like the conjugate complex of a number, 
and it is easily verified that in the special case when the linear operator is a number, 
the adjoint linear operator is the conjugate complex number. Thus it is reasonable 
to assume that the adjoint of a linear operator corresponds to the conjugate 
complex of a dynamical variable. With this physical significance for the adjoint 
of a linear operator, we may call the adjoint alternatively the conjugate complex 
linear operator, which conforms with our notation @. 

A linear operator may equal its adjoint, and is then called self-adjoint. 
It corresponds to a real dynamical variable, so it may be called alternatively 
a real linear operator. Any linear operator may be split up into a real part and 
an imaginary? part. For this reason the words ‘conjugate complex’ are applicable 
to linear operators and not the words ‘conjugate imaginary’. 


‘pure’ is omitted] 


adjoint 


conjugate complex 
linear operator 


self-adjoint 


real linear operator 


24 II. DYNAMICAL VARIABLES AND OBSERVABLES 


The conjugate complex of the sum of two linear operators is obviously the sum 
of their conjugate complexes. To get the conjugate complex of the product of two 
linear operators a and (, we apply formula (7) of Chapter I with 


(Al = (Pla, (Bl = (Q| 8, 
so that |A) =a@|P), |B) = B|Q). 
The result is (Q| Ba|P) = (P| a8 |Q) = (Q| eB |P) 
from (4). Since this holds for any |P) and (Q], we can infer that 
Ba = a8. (5) 


Thus the conjugate complex of the product of two linear operators equals the product 
of the conjugate complexes of the factors in the reverse order. 

As simple examples of this result, it should be noted that, if € and 7 are real, 
in general €7 is not real. This is an important difference from classical mechanics. 
However, £7 + n€ is real, and so is 7(€n — 7&). Only when € and 7 commute is €7 
itself also real. Further, if € is real, then so is €? and, more generally, €” with n 
any positive integer. 

We may get the conjugate complex of the product of three linear operators by 
successive applications of the rule (5) for the conjugate complex of the product of 


two of them. We have 


apy = alBy) = bya = 78a, (6) 
so the conjugate complex of the product of three linear operators equals 
the product of the conjugate complexes of the factors in the reverse order. The rule 
may easily be extended to the product of any number of linear operators. 

In the preceding section we saw that the product |A)(B]| is a linear operator. 
We may get its conjugate complex by referring directly to the definition of 
the adjoint. Multiplying |A)(B| into a general bra (P| we get (P|A)(B|, whose 
conjugate imaginary ket is (P|A)|B) =(A|P)|B) =|B)(A|P). 

Hence |A)(B| = |B)(AI. (7) 

We now have several rules concerning conjugate complexes and conjugate 
imaginaries of products, namely equation (7) of Chapter I, equations (4), 
(5), (6), (7) of this chapter, and the rule that the conjugate imaginary of (P|a 
is @|P). These rules can all be summed up in a single comprehensive rule, 
the conjugate complex or conjugate imaginary of any product of bra vectors, 
ket vectors and linear operators is obtained by taking the conjugate complex or 
conjugate imaginary of each factor and reversing the order of all the factors. 
The rule is easily verified to hold quite generally, also for the cases not explicitly 
given above. 


THEOREM. If € is a real linear operator and CE (PaO) (8) 
for a particular ket |P), m being a positive integer, then €|P) =0. 
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To prove the theorem, take first the case when m = 2. Equation (8) then gives 
(P| €?|P) = 0, showing that the ket €|P) multiplied by the conjugate imaginary 
bra (P|€ is zero. From the assumption (8) of Chapter I with €|P) for |A), we see 
that €|P) must be zero. Thus the theorem is proved for m = 2. 

Now takem >2and put €™”?|P)=|Q). 


Equation (8) now gives £71Q) =0. 

Applying the theorem for m = 2, we get £€|Q) =0 

or Co? | Pye 0), (9) 
By repeating the process by which equation (9) is obtained from (8), we obtain 
successively £¢™?|P\=0, €™?|P)=0,..., €&|P)=0, €|P)=0, 


and so the theorem is proved generally. 


9. Eigenvalues and eigenvectors 

We must make a further development of the theory of linear operators, consisting 
in studying the equation a |P)=alPy, (10) 
where a is a linear operator and a is anumber. This equation usually presents itself 
in the form that a is a known linear operator and the number a and the ket |P) 
are unknowns, which we have to try to choose so as to satisfy (10), ignoring 
the trivial solution |P) = 0. Equation (10) means that the linear operator a applied 
to the ket |P) just multiplies this ket by a numerical factor without changing its 
direction, or else multiplies it by the factor zero, so that it ceases to have a direction. 
This same a applied to other kets will, of course, in general change both their 
lengths and their directions. It should be noticed that only the direction of |P) 
is of importance in equation (10). If one multiplies |P) by any number not zero, 
it will not affect the question of whether (10) is satisfied or not. 

Together with equation (10), we should consider also the conjugate imaginary 
form of [that] equation (Ql a = b(Q| (11) 
where bis anumber. Here the unknowns are the number b and the non-zero bra (Q]. 
Equations (10) and (11) are of such fundamental importance in the theory that 
it is desirable to have some special words to describe the relationships between 
the quantities involved. If (10) is satisfied, we shall call a an eigenvalue’ of 
the linear operator a, or of the corresponding dynamical variable, and we shall 
call |P) an eigenket of the linear operator or dynamical variable. Further, we shall 
say that the eigenket |P) belongs to the eigenvalue a. Similarly, if (11) is satisfied, 
we shall call 6 an eigenvalue of a and (Q| an eigenbra belonging to this eigenvalue. 


tThe word ‘proper’ is sometimes used instead of ‘eigen’, but this is not satisfactory as 
the words ‘proper’ and ‘improper’ are often used with other meanings. For example, in §815 
and 46 the words ‘improper function’ and ‘proper-energy’ are used. 
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The words eigenvalue, eigenket, eigenbra have a meaning, of course, only with 
reference to a linear operator or dynamical variable. 

Using this terminology, we can assert that, if an eigenket of a is multiplied by 
any number not zero, the resulting ket is also an eigenket and belongs to the same 
eigenvalue as the original one. It is possible to have two or more independent 
eigenkets of a linear operator belonging to the same eigenvalue of that linear 
operator, e.g. equation (10) may have several solutions, |P1), |P2), |P3),... say, 
all holding for the same value of a, with the various eigenkets |P1), |P2), |P3),... 
independent. In this case it is evident that any linear combination of the eigenkets 
is another eigenket belonging to the same eigenvalue of the linear operator, e.g. 

c,|P1) + cg|P2) + c3|P3) +--- 
is another solution of (10), where c;, co, c3,... are any numbers. 

In the special case when the linear operator a of equations (10) and (11) is 
a number, & say, it is obvious that any ket |P) and bra (Q| will satisfy these 
equations provided a and b equal k. Thus a number considered as a linear operator 
has just one eigenvalue, and any ket is an eigenket and any bra is an eigenbra, 
belonging to this eigenvalue. 

The theory of eigenvalues and eigenvectors of a linear operator a which is not 
real is not of much use for quantum mechanics. We shall therefore confine ourselves 
to real linear operators for the further development of the theory. Putting for a 
the real linear operator €, we have instead of equations (10) and (11) 

£|P) =alP), (12) 
(Qlé = b(QI. (13) 
Three important results can now be readily deduced. 

(i) The eigenvalues are all real numbers. To prove that a satisfying (12) is real, 

we multiply (12) by the bra (P| on the left, obtaining 

(P|é|P) =a(P|P). 
Now from equation (4) with (B| replaced by (P| and a replaced by the real linear 
operator €, we see that the number (P|€|P) must be real, and from (8) of §6, 
(P|P) must be real and not zero. Hence a is real. Similarly, by multiplying (13) 
by |Q) on the right, we can prove that b is real. 

Suppose we have a solution of (12) and we form the conjugate imaginary 
equation, which will read (PLES ae?) 
that € and a are real* This conjugate imaginary equation now provides a solution 
of (13), with (Q| = (P| and b =a. Thus we can infer 

(ii) The eigenvalues associated with eigenkets are the same as the eigenvalues 
associated with eigenbras. 

(iii) The conjugate imaginary of any eigenket is an eigenbra belonging to 
the same eigenvalue, and conversely. ‘This last result makes it reasonable to call 


*lOriginal:- ‘in view of the reality of € and a.’| 
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the state corresponding to any eigenket or to the conjugate imaginary eigenbra 
an eigenstate of the real dynamical variable €. 

Eigenvalues and eigenvectors of various real dynamical variables are used very 
extensively in quantum mechanics, so it is desirable to have some systematic 
notation for labelling them. The following is suitable for most purposes. If € is 
a real dynamical variable, we call its eigenvalues €', €”, €", etc. Thus we have a letter 
by itself denoting a real dynamical variable or a real linear operator, and the same 
letter with primes or an index attached denoting a number, namely an eigenvalue 
of what the letter by itself denotes. An eigenvector may now be labelled by 
the eigenvalue to which it belongs. Thus |é’) denotes an eigenket belonging to 
the eigenvalue €’ of the dynamical variable €. If in a piece of work we deal 
with more than one eigenket belonging to the same eigenvalue of a dynamical 
variable, we may distinguish them one from another by means of a further label, 
or possibly of more than one further labels. Thus, if we are dealing with two 
eigenkets belonging to the same eigenvalue of ¢’, we may call them |¢’1) and |&’2). 

THEOREM. Two eigenvectors of a real dynamical variable belonging to different 
eigenvalues are orthogonal. 

To prove the theorem, let |é’) and |€”) be two eigenkets of the real dynamical 
variable €, belonging to the eigenvalues €’ and €” respectively. Then we have 
the equations E\€) = 2 i), (14) 

Se (15) 
Taking the conjugate imaginary of (14), we get (é|E =€ 
Multiplying this by |€”) on the right gives Calcd ats er ae Je") 
and multiplying (15) by (€’| on the left gives  (€’ ee = e (ere! ) 
Hence, subtracting, (Cf - Ve je"V = (16) 
showing that, if €’ 4 €% (€'|€”) = 0 and the two eigenvectors ey and |€”) are 
orthogonal. This theorem will be referred to as the orthogonality theorem. 

We have been discussing properties of the eigenvalues and eigenvectors of a real 
linear operator, but have not yet considered the question of whether, for a given real 
linear operator, any eigenvalues and eigenvectors exist, and if so, how to find them. 
This question is in general very difficult to answer. There is one useful special case, 
however, which is quite tractable, namely when the real linear operator, € say, 
satisfies an algebraic equation b(E) = "+06" +02" 7+---+a, =0, (17) 
the coefficients a being numbers. This equation means, of course, that the linear 
operator @(€) produces the result zero when applied to any ket vector or to any 
bra vector. 

Let (17) be the simplest algebraic equation that € satisfies. Then it will be 
shown that 

(a) The number of eigenvalues of € is n. 

(G) There are so many eigenkets of € that any ket whatever can be expressed 
as a sum of such eigenkets. 
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The algebraic form ¢(£) can be factorized into n linear factors, the result being 
(OSE =Q)e He) E=—e) ce =—G,) (18) 
say, the c’s being numbers, not assumed to be all different. This factorization can 
be performed with € a linear operator just as well as with € an ordinary algebraic 
variable, since there is nothing occurring in (18) that does not commute with €. 
Let the quotient when ¢(€) is divided by (€ —c,) be y,(€), so that 
b(€) = (€ = CG) ae =1, 2, 3,..., n). 
Then, for any ket |P), (€ — cr) xr(€) |P) = o(€) |P) = 0. (19) 
Now y,(&)|P) cannot vanish for every ket |P), as otherwise y,(€) itself would 
vanish and we should have € satisfying an algebraic equation of degree n — 1, 
which would contradict the assumption that (17) is the simplest equation that € 
satisfies. If we choose |P) so that y,(€)|P) does not vanish, then equation (19) 
shows that y,(€)|P) is an eigenket of €, belonging to the eigenvalue c,. 
The argument holds for each value of r from 1 to n, and hence each of the c’s 
is an eigenvalue of €. No other number can be an eigenvalue of €, since if €’ is 
any eigenvalue, belonging to an eigenket |&’), € |€’) = €'|&") and we can deduce 
0(&) \é’) = O(&’) |€'), and since the left-hand side vanishes we must have $(&’) = 0. 

To complete the proof of (a) we must verify that the c’s are all different. 
Suppose the c’s are not all different and c, occurs m times say, with m > 1. 
Then ¢(€) is of the form o(€) = (€ — cs)'""0(8), 
with 6(€) a rational integral function of €. Equation (17) now gives us 

(E — c,)"6(€) |A) = 0 (20) 
for any ket |A). Since c, is an eigenvalue of € it must be real, so that (€ — c;) is 
a real linear operator. Equation (20) is now of the same form as equation (8) with 
(€ —c,) for € and @() |A) for |P). From the theorem connected with equation (8) 
we can infer that (€ —cs)O(E) |A) = 0. 

Since the ket |A) is arbitrary, (€ —c,)0(€) = 0, 
which contradicts the assumption that (17) is the simplest equation that € satisfies. 
Hence the c’s are all different and (a) is proved. 

Let x,(c,) be the number obtained when c, is substituted for € in the algebraic 
expression y,(€). Since the c’s are all different, x,(c,) cannot vanish. Consider 
now the expression Xr(§) 4 (21) 

~ Xr(cr) 
If c, is substituted for € here, every term in the sum vanishes except the one for 
which r = s, since x,(€) contains (€—c,) as a factor when r 4 s, and the term for 
which r = s is unity, so the whole expression vanishes. Thus the expression (21) 
vanishes when € is put equal to any of the n numbers c, c2,..., Cyn. Since, however, 
the expression is only of degree n — 1 in €, it must vanish identically. If we now 
apply the linear operator (21) to an arbitrary ket |P) and equate the result to zero, 
we get 


29 


IP) = 3 (6) |P). (22) 


7 Xr(cr) 
Each term in the sum on the right here is, according to (19), an eigenket of €, 
if it does not vanish. Equation (22) thus expresses the arbitrary ket |P) as a sum 
of eigenkets of €, and thus (3) is proved. 

As a simple example we may consider a real linear operator o that satisfies 
the equation mee (23) 
Then o has the two eigenvalues 1 and —1. Any ket |P) can be expressed as 

IP) = 3(1+0)|P)+$(1-o)|P). 
It is easily verified that the two terms on the right here are eigenkets of 7, belonging 
to the eigenvalues 1 and —1 respectively, when they do not vanish. 


10. Observables 


We have made a number of assumptions about the way in which states 
and dynamical variables are to be represented mathematically in the theory. 
These assumptions are not, by themselves, laws of nature, but become laws 
of nature when we make some further assumptions that provide a physical 
interpretation of the theory. Such further assumptions must take the form 
of establishing connexions between the results of observations, on one hand, 
and the equations of the mathematical formalism on the other. 

When we make an observation we measure some dynamical variable. It is 
obvious physically that the result of such a measurement must always be 
a real number, so we should expect that any dynamical variable that we can 
measure must be a real dynamical variable. One might think one could 
measure a complex dynamical variable by measuring separately its real and? 
imaginary parts. But this would involve two measurements or two observations, 
which would be all right in classical mechanics, but would not do in quantum 
mechanics, where two observations in general interfere with one another—it is 
not in general permissible to consider that two observations can be made exactly 
simultaneously, and if they are made in quick succession the first will usually 
disturb the state of the system and introduce an indeterminacy that will affect 
the second. We therefore have to restrict the dynamical variables that we can 
measure to be real, the condition for this in quantum mechanics being as given 
in §8. Not every real dynamical variable can be measured, however. A further 
restriction is needed, as we shall see later. 

We now make some assumptions for the physical interpretation of the theory. 
If the dynamical system is in an eigenstate of a real dynamical variable €, 
belonging to the eigenvalue €', then a measurement of € will certainly give as result 
the number &'. Conversely, if the system is in a state such that a measurement 


t[\pure’ omitted. ] 
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of a real dynamical variable € is certain to give one particular result (instead 
of giving one or other of several possible results according to a probability law, 
as is in general the case), then the state is an eigenstate of € and the result 
of the measurement is the eigenvalue of € to which this eigenstate belongs. 
These assumptions are reasonable on account of the eigenvalues of real linear 
operators being always real numbers. 

Some of the immediate consequences of the assumptions will be noted. 
If we have two or more eigenstates of a real dynamical variable € belonging to 
the same eigenvalue €’, then any state formed by superposition of them will also be 
an eigenstate of € belonging to the eigenvalue €. We can infer that if we have two 
or more states for which a measurement of is certain to give the result ¢/, then for 
any state formed by superposition of them a measurement of € will still be certain 
to give the result € This gives us some insight into the physical significance 
of superposition of states. Again, two eigenstates of € belonging to different 
eigenvalues are orthogonal. We can infer that two states for which a measurement 
of € is certain to give two different results are orthogonal. This gives us some 
insight into the physical significance of orthogonal states. 

When we measure a real dynamical variable €, the disturbance involved in 
the act of measurement causes a jump in the state of the dynamical system. 
From physical continuity, if we make a second measurement of the same dynamical 
variable € immediately after the first, the result of the second measurement must 
be the same as that of the first. Thus after the first measurement has been made, 
there is no indeterminacy in the result of the second. Hence, after the first 
measurement has been made, the system is in an eigenstate of the dynamical 
variable €, the eigenvalue it belongs to being equal to the result of the first 
measurement. This conclusion must still hold if the second measurement is 
not actually made. In this way we see that a measurement always causes 
the system to jump into an eigenstate of the dynamical variable that is 
being measured, the eigenvalue this eigenstate belongs to being equal to the result 
of the measurement. 

We can infer that, with the dynamical system in any state, any result of 
a measurement of a real dynamical variable is one of its eigenvalues. Conversely, 
every eigenvalue is a possible result of a measurement of the dynamical variable for 
some state of the system, since it is certainly the result if the state is an eigenstate 
belonging to this eigenvalue. This gives us the physical significance of eigenvalues. 
The set of eigenvalues of a real dynamical variable are just the possible results of 
measurements of that dynamical variable and the calculation of eigenvalues is for 
this reason an important problem. 

Another assumption we make connected with the physical interpretation 
of the theory is that, if a certain real dynamical variable € is measured with 
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the system in a particular state, the states into which the system may jump on 
account of the measurement are such that the original state is dependent on them. 
Now these states into which the system may jump are all eigenstates of €, and hence 
the original state is dependent on eigenstates of €. But the original state may 
be any state, so we can conclude that any state is dependent on eigenstates 
of €. If we define a complete set of states to be a set such that any state is 
dependent on them, then our conclusion can be formulated—the eigenstates of € 
form a complete set. 

Not every real dynamical variable has sufficient eigenstates to form 
a complete set. Those whose eigenstates do not form complete sets are not 
quantities that can be measured. We obtain in this way a further condition 
that a dynamical variable has to satisfy in order that it shall be susceptible to 
measurement, in addition to the condition that it shall be real. We call a real 
dynamical variable whose eigenstates form a complete set an observable. Thus any 
quantity that can be measured is an observable. 

The question now presents itself—Can every observable be measured? 
The answer theoretically is yes. In practice it may be very awkward, or perhaps 
even beyond the ingenuity of the experimenter, to devise an apparatus which could 
measure some particular observable, but the theory always allows one to imagine 
that the measurement can be made. 

Let us examine mathematically the condition for a real dynamical variable € 
to be an observable. Its eigenvalues may consist of a (finite or infinite) discrete set 
of numbers, or alternatively, they may consist of all numbers in a certain range, 
such as all numbers lying between a and b. In the former case, the condition that 
any state is dependent on eigenstates of € is that any ket can be expressed as a sum 
of eigenkets of €. In the latter case the condition needs modification, since one may 
have an integral instead of a sum, i.e. a ket |P) may be expressible as an integral 


of eigenkets of €, iP) = | |e’) dé’ (24) 


|f’) being an eigenket of € belonging to the eigenvalue €’ and the range of integration 
being the range of eigenvalues, as such a ket is dependent on eigenkets of €. 
Not every ket dependent on eigenkets of € can be expressed in the form of 
the right-hand side of (24), since one of the eigenkets itself cannot, and more 
generally any sum of eigenkets cannot. The condition for the eigenstates of € 
to form a complete set must thus be formulated, that any ket |P) can be expressed 
as an integral plus a sum of eigenkets of €, i.e. 


ips / ee) ae’ + Sera), (25) 


where the |é'c), |€"d) are all eigenkets of €, the labels c and d being inserted 
to distinguish them when the eigenvalues ¢’ and €" are equal, and where the integral 
is taken over the whole range of eigenvalues and the sum is taken over any selection 
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of them. If this condition is satisfied in the case when the eigenvalues of € consist 
of a range of numbers, then € is an observable. 

There is a more general case that sometimes occurs, namely the eigenvalues 
of € may consist of a range of numbers together with a discrete set of numbers 
lying outside the range. In this case the condition that € shall be an observable 
is still that any ket shall be expressible in the form of the right-hand side of (25), 
but the sum over r is now a sum over the discrete set of eigenvalues as well as 
a selection of those in the range. 

It is often very difficult to decide mathematically whether a particular 
real dynamical variable satisfies the condition for being an observable or not, 
because the whole problem of finding eigenvalues and eigenvectors is in general very 
difficult. However, we may have good reason on experimental grounds for believing 
that the dynamical variable can be measured and then we may reasonably assume 
that it is an observable even though the mathematical proof is missing. This is 
a thing we shall frequently do during the course of development of the theory, 
e.g. we shall assume the energy of any dynamical system to be always an observable, 
even though it is beyond the power of present-day mathematical analysis to prove 
it so except in [relatively] simple cases. 

In the special case when the real dynamical variable is a number, every state 
is an eigenstate and the dynamical variable is obviously an observable. 
Any measurement of it always gives the same result, so it is just a physical constant, 
like the charge on an electron. A physical constant in quantum mechanics may 
thus be looked upon either as an observable with a single eigenvalue or as a mere 
number appearing in the equations, the two points of view being equivalent. 

If the real dynamical variable satisfies an algebraic equation, 
then the result (G) of the preceding section shows that the dynamical variable is 
an observable. Such an observable has a finite number of eigenvalues. Conversely, 
any observable with a finite number of eigenvalues satisfies an algebraic equation, 
since if the observable € has as its eigenvalues €/, €”..., €", then 

(€-e)(E-£")...(€-&) |P) =0 
holds for |P) any eigenket of €, and thus it holds for any |P) whatever, because any 
ket can be expressed as a sum of eigenkets of € on account of € being an observable. 
Hence (€-e)(E- 6")... €-&) =0. (26) 

As an example we may consider the linear operator |A)(A|, where |A) is 

a normalized ket. This linear operator is real according to (7), and its square is 
{|A)(A]}? = |A)(A] A)(A] = |A)(A| (27) 
since (A|A) = 1. Thus its square equals itself and so it satisfies an algebraic 
equation and is an observable. Its eigenvalues are 1 and 0, with |A) as the eigenket 
belonging to the eigenvalue 1 and all kets orthogonal to |A) as eigenkets belonging 
to the eigenvalue 0. A measurement of the observable thus certainly gives 
the result 1 if the dynamical system is in the state corresponding to |A) and 
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the result 0 if the system is in any orthogonal state, so the observable may be 
described as the quantity which determines whether the system is in the state |A) 
or not. 

Before concluding this section we should examine the conditions for an integral 
such as occurs in (24) to be significant. Suppose |X) and |Y) are two kets which 
can be expressed as integrals of eigenkets of the observable €, 


X)= flee) de, Iv) = flea) ae" 
x and y being used as labels to distinguish the two integrands. Then we have, 
taking the conjugate imaginary of the first equation and multiplying by the second 


(xlyy= ff ex |ety) ast as (28) 
Consider now the single integral 
[exe ae" (29) 
From the orthogonality theorem, the integrand here must vanish over the whole 
range of integration except the one point €” = €. If the integrand is finite at 
this point, the integral (29) vanishes, and if this holds for all €, we get from (28) 
that (X | Y) vanishes. Now in general (X | Y) does not vanish, so in general 
(é'x | €'y) must be infinitely great in such a way as to make (29) non-vanishing 
and finite. The form of infinity required for this will be discussed in 815. 

In our work up to the present it has been implied that our bra and ket vectors 
are of finite length and their scalar products are finite. We see now the need for 
relaxing this condition when we are dealing with eigenvectors of an observable 
whose eigenvalues form a range. If we did not relax it, the phenomenon of 
ranges of eigenvalues could not occur and our theory would be too weak for most 
practical problems. 

Taking |Y) =|X) above, we get the result that in general (&'x | &’x) is 
infinitely great. We shall assume that if |€’x) 4 0 


[ex Leta) de" >0, (30) 


as the axiom corresponding to (8) of §6 for vectors of infinite length. 

The space of bra or ket vectors when the vectors are restricted to be of 
finite length and to have finite scalar products is called by mathematicians 
a Hilbert space. The bra and ket vectors that we now use form a more general 
space than a Hilbert space. 

We can now see that the expansion of a ket |P) in the form of the right-hand side 
of (25) is unique, provided there are not two or more terms in the sum referring 
to the same eigenvalue. To prove this result, let us suppose that two different 
expansions of |P) are possible. Then by subtracting one from the other, we get 


an equation of the form 0 = / \é’a) dé’ + S- |E°b), (31) 


Hilbert space 


34 II. DYNAMICAL VARIABLES AND OBSERVABLES 


a and b being used as new labels for the eigenvectors, and the sum over s including 
all terms left after the subtraction of one sum from the other. If there is a term in 
the sum in (31) referring to an eigenvalue €‘ not in the range, we get, by multiplying 
(31) on the left by (¢€'b| and using the orthogonality theorem, 
0 = (é'b | €'b), 
which contradicts (8) of §6. Again, if the integrand in (31) does not vanish for 
some eigenvalue €” not equal to any €° occurring in the sum, we get, by multiplying 
(31) on the left by (€’a| and using the orthogonality theorem, 


0= / (a | a) dé! 


which contradicts (30). Finally, if there is a term in the sum in (31) referring to 
an eigenvalue €' in the range, we get, multiplying (31) on the left by (&‘b|, 


0= / (é'b | Ea) dé’ + (ED | £0) (32) 
and multiplying (31) on the left by (é‘a| 
0= [ia | é’a) dé’ + (Ea | Eb). (33) 


Now the integral in (33) is finite, so (€'a | €'b) is finite and (€'b | €'a) is finite. 
The integral in (32) must then be zero, so (€'b | &'b) is zero and we again have 
a contradiction. Thus every term in (31) must vanish and the expansion of a ket 
|P) in the form of the right-hand side of (25) must be unique. 


11. Functions of observables 

Let € be an observable. We can multiply it by any real number & and get another 
observable k€. In order that our theory may be self-consistent it is necessary that, 
when the system is in a state such that a measurement of the observable € 
certainly gives the result €, a measurement of the observable k& shall certainly 
give the result ké. It is easily verified that this condition is fulfilled. The ket 
corresponding to a state for which a measurement of € certainly gives the result ¢’ 
is an eigenket of €, |¢’) say, satisfying € |€’) = €’ |é’). 

This equation leads to RE|E) = he |e), 

showing that |€’) is an eigenket of k€ belonging to the eigenvalue k€', and thus that 
a measurement of k€ will certainly give the result k&’ 

More generally, we may take any real function of €, f(€) say, and consider it 
as a new observable which is automatically measured whenever € is measured, 
since an experimental determination of the value of € also provides the value 
of f(€). We need not restrict f(€) to be real, and then its real and* imaginary 
parts are two observables which are automatically measured when € is measured. 


*l‘pure’ omitted. 
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For the theory to be consistent it is necessary that, when the system is in a state 
such that a measurement of € certainly gives the result ¢, a measurement of the 
real and* imaginary parts of f(€) shall certainly give for results the real and* 
imaginary parts of f(€’). In the case when f(£) is expressible as a power series 
flE) =e c1€ CoE? 636° BP ey 
the c’s being numbers, this condition can again be verified by elementary algebra. 
In the case of more general functions f it may not be possible to verify 
the condition. The condition may then be used to define f(€), which we have 
not yet defined mathematically. In this way we can get a more general definition 
of a function of an observable than is provided by power series. 
We define f(€) in general to be that linear operator which satisfies 


FOIE) = FE) IE) (34) 
for every eigenket |€’) of €, f(€') being a number for each eigenvalue ¢’. It is 
easily seen that this definition is self-consistent when applied to eigenkets |€’) 
that are not independent. If we have an eigenket |&’A) dependent on other 
eigenkets of €, these other eigenkets must all belong to the same eigenvalue €’, 
otherwise we should have an equation of the type (31), which we have seen is 
impossible. On multiplying the equation which expresses |€’A) linearly in terms of 
the other eigenkets of € by f(€) on the left, we merely multiply each term in it by 
the number f(&’), so we obviously get a consistent equation. Further, equation (34) 
is sufficient to define the linear operator f(€) completely, since to get the result of 
f(€) multiplied into an arbitrary ket |P), we have only to expand |P) in the form 
of the right-hand side of (25) and take 


fe |P) = f FE) Io) ae + rey era, (35) 


The conjugate complex f(&) of f(€) is defined by the conjugate imaginary 
equation to (34), namely (é’| f(€) = f(€’) (é'|, holding for any eigenbra (é'|, 
f(é') being the conjugate complex function to f(€’). Let us replace €’ here by 
€" and multiply the equation on the right by the arbitrary ket |P). Then we get, 


using the expansion (25) for |P), 


"| FO IP) = Fle”) (e" | P) 
= [ FEE" [eo de + NFEVE' lea 


= [ FEE" |e det + Fee" lea (36) 


with the help of the orthogonality theorem, (&” | €’d) being understood to be 
zero if €” is not one of the eigenvalues to which the terms in the sum in 
(25) refer. Again, putting the conjugate complex function f(€’) for f(€’) in (35) 
and multiplying on the left by (€”|, we get 
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(EF IP) = f FED" Lee) ds! + Fle") €" |e") 
The right-hand side here equals that of (36), since the integrands vanish for ¢' 4 €", 
and hence (FO IP) =" FEO)IP). 
This holds for (€”| any eigenbra and |P) any ket, so — f(€) = f(&). (37) 
Thus the conjugate complex of the linear operator f(&) is the conjugate complex 
function f of €. 

It follows as a corollary that if f(€’) is a real function of €, f(€) is a real linear 
operator. f(£€) is then also an observable, since its eigenstates form a complete set, 
every eigenstate of € being also an eigenstate of f(£). 

With the above definition we are able to give a meaning to any function f 
of an observable, provided only that the domain of existence of the function of 
a real variable f(x) includes all the eigenvalues of the observable. If the domain of 
existence contains other points besides these eigenvalues, then the values of f(x) for 
these other points will not affect the function of the observable. The function need 
not be analytic or continuous. The eigenvalues of a function f of an observable 
are just the function f of the eigenvalues of the observable. 

It is important to observe that the possibility of defining a function f of 
an observable requires the existence of a unique number f(a) for each value of 
x which is an eigenvalue of the observable. Thus the function f(x) must be 
single-valued. This may be illustrated by considering the question: When we have 
an observable f(A) which is real function of the observable A, is the observable A 
a function of the observable f(A)? The answer to this is yes, if different eigenvalues 
A’ of A always lead to different values of f(A’). If, however, there exist two different 
eigenvalues of A, A’ and A” say, such that f(A’) = f(A”), then, corresponding to 
the eigenvalue f(A’) of the observable f(A), there will not be a unique eigenvalue 
of the observable A and the latter will not be a function of the observable f(A). 

It may easily be verified mathematically, from the definition, that the sum or 
product of two functions of an observable is a function of that observable and that 
a function of a function of an observable is a function of that observable. Also it is 
easily seen that the whole theory of functions of an observable is symmetrical 
between bras and kets and that we could equally well work from the equation 


instead of from (34). 1G) = ME) (38) 


We shall conclude this section with a discussion of two examples which 
are of great practical importance, namely the reciprocal and the square root. 
The reciprocal of an observable exists if the observable does not have 
the eigenvalue zero. If the observable a does not have the eigenvalue zero, 
the reciprocal observable, which we call a~! or 1/a, will satisfy 

a" ja’) = a’ Ja’) (39) 
where |q’) is an eigenket of a belonging to the eigenvalue a’. Hence 
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aa ja’) = aa’ Ja’) = |a’). 

Since this holds for any eigenket |a’), we must have 

aa = 1, (40) 
Similarly, ata=1. (41) 
Either of these equations is sufficient to determine a! completely, provided a does 
not have the eigenvalue zero. To prove this in the case of (40), let x be any linear 
operator satisfying the equation ax=1 
and multiply both sides on the left by the a~! defined by (39). The result is 


and hence from (41) 2 G-3 

Equations (40) and (41) can be used to define the reciprocal, when it exists, 
of a general linear operator a, which need not even be real. One of these equations 
by itself is then not necessarily sufficient. If any two linear operators a and (3 have 
reciprocals, their product af has the reciprocal 

(af) = Bla“ (42) 
obtained by taking the reciprocal of each factor and reversing their order. We verify 
(42) by noting that its right-hand side gives unity when multiplied by a, either on 
the right or on the left. This reciprocal law for products can be immediately 
extended to more than two factors, i.e., 

(By) a ey Be 
The square root of an observable a always exists, and is real if wa has no negative 
eigenvalues. We write it \/a or a2. It satisfies 
Va a’) = Va! la’), (43) 
|a’) being an eigenket of a belonging to the eigenvalue a’. Hence 
Vav/a|a") = Va'vVa! |a’) = a! |a’) = aa’), 
and since this holds for any eigenket |a’) we must have 
Vara = a. (44) 

On account of the ambiguity of sign in (43) there will be several square roots. 
To fix one of them we must specify a particular sign in (43) for each eigenvalue. 
This sign may vary irregularly from one eigenvalue to the next and equation (43) 
will always define a linear operator \/a satisfying (44) and forming a square-root 
function of a. If there is an eigenvalue of a with two or more independent 
eigenkets belonging to it, then we must, according to our definition of a function, 
have the same sign in (43) for each of these eigenkets. If we took different signs, 
however, equation (44) would still hold, and hence equation (44) by itself is 
not sufficient to define \/a, except in the special case when there is only one 
independent eigenket of a belonging to any eigenvalue. 

The number of different square roots of an observable is 2", where n is the total 
number of eigenvalues not zero. In practice the square root function is used only 
for observables without negative eigenvalues and the particular square root that is 
useful is the one for which the positive sign is always taken in (43). This one will 
be called the positive square root. 


square root of 
an observable 


positive 
root 


square 


observable has a 
value 


observable having 
an average value 


38 II. DYNAMICAL VARIABLES AND OBSERVABLES 


12. The general physical interpretation 

The assumptions that we made at the beginning of 810 to get a physical 
interpretation of the mathematical theory are of a rather special kind, 
since they can be used only in connexion with eigenstates. We need some more 
general assumption which will enable us to extract physical information from 
the mathematics even when we are not dealing with eigenstates. 

In classical mechanics an observable always, as we say, ‘has a value’ for any 
particular state of the system. What is there in quantum mechanics corresponding 
to this? If we take any observable € and any two states x and y, corresponding to 
the vectors (a| and |y), then we can form the number (a|€|y). This number 
is not very closely analogous to the value which an observable can ‘have’ in 
the classical theory, for three reasons, namely, (i) it refers to two states of 
the system, while the classical value always refers to one, (ii) it is in general 
not a real number, and (iii) it is not uniquely determined by the observable 
and the states, since the vectors («| and |y) contain arbitrary numerical factors. 
Even if we impose on (x| and |y) the condition that they shall be normalized, 
there will still be an undetermined factor of modulus unity in (2| € |y). These three 
reasons cease to apply, however, if we take the two states to be identical and 
ly) to be the conjugate imaginary vector to (z|. The number that we then get, 
namely (a|€ |x), is necessarily real, and also it is uniquely determined when (| 
is normalized, since if we multiply (z| by the numerical factor e’> c being some 
real number, we must multiply |x) by e~* and (| € |x) will be unaltered. 

One might thus be inclined to make the tentative assumption that 
the observable € ‘has the value’ (x|&|) for the state xz, in a sense analogous to 
the classical sense. This would not be satisfactory, though, for the following reason. 
Let us take a second observable 7, which would have by the above assumption 
the value (x|7|x) for this same state. We should then expect, from classical 
analogy, that for this state the sum of the two observables would have a value 
equal to the sum of the values of the two observables separately and the product 
of the two observables would have a value equal to the product of the values of 
the two observables separately. Actually, the tentative assumption would give for 
the sum of the two observables the value (x|€ + 7 |x), which is, in fact, equal to 
the sum of (x|€|x) and (z|7|x), but for the product it would give the value 
(x| €n |x) or (x| n& |x), neither of which is connected in any simple way with (| € |x) 
and (| 17 |). 

However, since things go wrong only with the product and not with the sum, 
it would be reasonable to call (x|€|x) the average value of the observable for 
the state x. This is because the average of the sum of two quantities must 
equal the sum of their averages, but the average of their product need not equal 
the product of their averages. We therefore make the general assumption that 
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if the measurement of the observable € for the system in the state corresponding 
to |x) is made a large number of times, the average of all the results obtained will 
be (x|€ |x), provided |x) is normalized. If |x) is not normalized, as is necessarily 
the case if the state x is an eigenstate of some observable belonging to an eigenvalue 
in a range, the assumption becomes that the average result of a measurement of € 
is proportional to (x|€ |x). This general assumption provides a basis for a general 
physical interpretation of the theory. 

The expression that an observable ‘has a particular value’ for a particular state 
is permissible in quantum mechanics in the special case when a measurement 
of the observable is certain to lead to the particular value, so that the state is 
an eigenstate of the observable. It may easily be verified from the algebra that, 
with this restricted meaning for an observable ‘having a value’, if two observables 
have values for a particular state, then for this state the sum of the two observables 
(if this sum is an observable’) has a value equal to the sum of the values of 
the two observables separately and the product of the two observables (if this 
product is an observable*) has a value equal to the product of the values of the two 
observables separately. 

In the general case we cannot speak of an observable having a value for 
a particular state, but we can speak of its having an average value for the state. 
We can go further and speak of the probability of its having any specified value 
for the state, meaning the probability of this specified value being obtained when 
one makes a measurement of the observable. This probability can be obtained 
from the general assumption in the following way. 

Let the observable be € and let the state correspond to the normalized ket |x). 
Then the general assumption tells us, not only that the average value of € is 
(x| € |x), but also that the average value of any function of €, f(&) say, is (x| f(€) |x). 
Take f(£) to be that function of € which is equal to unity when € = a, a being 
some real number, and zero otherwise. This function of € has a meaning according 
to our general theory of functions of an observable, and it may be denoted by dea, 
in conformity with the general notation of the symbol 6 with two suffixes given 
on p. 52 (equation (17) [of §16]). The average value of this function of € is just 
the probability, P, say, of having the value a. Thus Pa = (2| d¢a|z). (45) 
If a is not an eigenvalue of €, d¢q multiplied into any eigenket of € is zero, and hence 
dea = 0 and P, = 0. This agrees with a conclusion of §10, that any result of 
a measurement of an observable must be one of its eigenvalues. 

If the possible results of a measurement of € form a range of numbers, 
the probability of € having exactly a particular value will be zero in most 


tThis is not obviously so, since the sum may not have sufficient eigenstates to form 
a complete set, in which case the sum, considered as a single quantity, would not be measurable. 

Here the reality condition [that the observables be sets of real numbers] may fail, as well as 
the condition for the eigenstates to form a complete set. 
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physical problems. The quantity of physical importance is then the probability 
of € having a value within a small range, say from a to a+ da. This probability, 
which we may call P(a)da, is equal to the average value of that function of € 
which is equal to unity for € lying within the range a to a+ da and zero otherwise. 
This function of € has a meaning according to our general theory of functions of 
an observable. Denoting it by y(&), we have P(a)da = (a|x(€) |x). (46) 
If the range a to a+ da does not include any eigenvalues of €£, we have as above 
x(€) = 0 and P(a) = 0. If |x) is not normalized, the right-hand sides of (45) and 
(46) will still be proportional to the probability of € having the value a and lying 
within the range a to a+ da respectively. 

The assumption of §10, that a measurement of € is certain to give the result €’ 
if the system is in an eigenstate of € belonging to the eigenvalue ¢’, is consistent 
with the general assumption for physical interpretation and can in fact be deduced 
from it. Working from the general assumption we see that, if |€’) is an eigenket of 
€ belonging to the eigenvalue €’, then, in the case of discrete eigenvalues of €, 

dea |€’) =O unless a= €' 
and in the case of a range of eigenvalues of € 
x(€) |é") =0 unless the range a to a + da includes €' 
In either case, for the state corresponding to |’) the probability of € having any 
value other than €’ is zero. 

An eigenstate of € belonging to an eigenvalue ¢’ lying in a range is a state which 
cannot strictly be realized in practice, since it would need an infinite amount of 
precision to get € to equal exactly €. The most that could be attained in practice 
would be to get € to lie within a narrow range about the value ¢’. The system 
would then be in a state approximating to an eigenstate of €. Thus an eigenstate 
belonging to an eigenvalue in a range is a mathematical idealization of what can 
be attained in practice. All the same such eigenstates play a very useful role in 
the theory and one could not very well do without them. Science contains many 
examples of theoretical concepts which are limits of things met with in practice 
and are useful for the precise formulation of laws of nature, although they are 
not realizable experimentally, and this is just one more of them. It may be that 
the infinite length of the ket vectors corresponding to these eigenstates is connected 
with their unrealizability, and that all realizable states correspond to ket vectors 
that can be normalized and that form a Hilbert space. 


13. Commutability and compatibility 
A state may be simultaneously an eigenstate of two observables. If the state 
corresponds to the ket vector |A) and the observables are € and 7, we should then 
have the equations E|A) =€'|A), 

n|A) =1'|A), 
where €’ and 7’ are eigenvalues of € and 77 respectively. We can now deduce 
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én|A) = €n' |A) = €'7' |A) = En A) = 06 |A) = 6 |A), 

or (n — €) |A) = 0. 
This suggests that the chances for the existence of a simultaneous eigenstate are 
most favourable if £7 — n€ = 0 and the two observables commute. If they do not 
commute a simultaneous eigenstate is not impossible, but is rather exceptional. 
On the other hand, if they do commute there exist so many simultaneous 
eigenstates that they form a complete set, as will now be proved. 

Let € and 7 be two commuting observables. Take an eigenket of 7, |7’) say, 
belonging to the eigenvalue 7, and expand it in terms of eigenkets of € in the form 


of the right-hand side of (25), thus ; ae i oy 
: (25) In) = f etn’ dé’ +S |&n'd). (47) 


The eigenkets of € on the right-hand side here have 7 inserted in them as 
an extra label, in order to remind us that they come from the expansion of 
a special ket vector, namely |7’), and not a general one as in equation (25). We can 
now show that each of these eigenkets of € is also an eigenket of 7 belonging to 
the eigenvalue 77. We have 

0= (n= a)It) = faa) |etnfe) ae + aed). (48) 
Now the ket (7 — 1’) |§"7'd) satisfies ‘ 

&(n — 1) |E'n'd) = (n = E16" n'd) = (9 — EB" |E"n'd) = €"( — 17') |E"n'd), 
showing that it is an eigenket of € belonging to the eigenvalue €7 and similarly 
the ket (7 — 7) |é’n'c) is an eigenket of € belonging to the eigenvalue €’. 
Equation (48) thus gives an integral plus a sum of eigenkets of € equal to zero, 
which, as we have seen with equation (31), is impossible unless the integrand and 
every term in the sum vanishes. Hence 

(n — 1) |E'n'c) =9, — (n — 17) |€"n'd) = 0, 
so that all the kets appearing on the right-hand side of (47) are eigenkets of 7 
as well as of €. Equation (47) now gives |7’) expanded in terms of simultaneous 
eigenkets of € and 7. Since any ket can be expanded in terms of eigenkets |7’) of 7, 
it follows that any ket can be expanded in terms of simultaneous eigenkets of € 
and 7, and thus the simultaneous eigenstates form a complete set. 

The above simultaneous eigenkets of € and 7, |€’n/c) and |€"7'd), are labelled by 
the eigenvalues €’ and 7, or &" and 7, to which they belong, together with the labels 
c and d which may also be necessary. The procedure of using eigenvalues as labels 
for simultaneous eigenvectors will be generally followed in the future, just as it has 
been followed in the past for eigenvectors of single observables. 

The converse to the above theorem says that, if € and are two observables such 
that their simultaneous eigenstates form a complete set, then € and n commute. 
To prove this, we note that, if |&’7’) is a simultaneous eigenket belonging to 
the eigenvalues €’ and 7’, (1m — n€) |&’n') = (€'n! — 1€") |&'n') = 0. (49) 
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Since the simultaneous eigenstates form a complete set, an arbitrary ket |P) can 
be expanded in terms of simultaneous eigenkets |&'7'), for each of which (49) holds, 
and hence (9 — n€) |P) =0 
and so En — n€ = 0. 

The idea of simultaneous eigenstates may be extended to more than two 
observables and the above theorem and its converse still hold, i.e. if any set of 
observables commute, each with all the others, their simultaneous eigenstates form 
a complete set, and conversely. The same arguments used for the proof with two 
observables are adequate for the general case; e.g., if we have three commuting 
observables €, 7, ¢, we can expand any simultaneous eigenket of € and 7 in terms 
of eigenkets of ¢ and then show that each of these eigenkets of ¢ is also an eigenket 
of € and of 7. Thus the simultaneous eigenket of € and 7 is expanded in terms 
of simultaneous eigenkets of €, 7 and ¢, and since any ket can be expanded in 
terms of simultaneous eigenkets of € and 7, it can also be expanded in terms of 
simultaneous eigenkets of €, 7 and ¢. 

The orthogonality theorem applied to simultaneous eigenkets tells us that 
two simultaneous eigenvectors of a set of commuting observables are orthogonal 
if the sets of eigenvalues to which they belong differ in any way. 

Owing to the simultaneous eigenstates of two or more commuting observables 
forming a complete set, we can set up a theory of functions of two or more 
commuting observables on the same lines as the theory of functions of a single 
observable given in 811. If €, 7, ¢,... are commuting observables, we define 
a general function f of them to be that linear operator f(€,7,¢,...) which satisfies 

FS Gee MENG = FE Cres) (EN Ce), (50) 
where |é'7'¢’...) is any simultaneous eigenket of €, 7, ¢,... belonging to 
the eigenvalues €',7',¢’,..... Here f is any function such that f(a,b,c,...) is 
defined for all values of a, b, c,... which are eigenvalues of €, 7, ¢,... respectively. 
As with a function of a single observable defined by (34), we can show that 
f(€,7,¢,...) is completely determined by (50), that 


FEA lS LEG) 


corresponding to (37), and that if f(a,b,c,...) is a real function, f(€,7,¢,...) is 
real and is an observable. 

We can now proceed to generalize the results (45) and (46). Given a set of 
commuting observables €, 7, ¢,..., we may form that function of them which is 
equal to unity when € = a, n = b, ¢ = c,..., a, b, c,... being real numbers, 
and is equal to zero when any of these conditions is not fulfilled. This function 
may be written deqdnndcc--., and is in fact just the product in any order of 
the factors dga, Onb, Occ... defined as functions of single observables, as may be 
seen by substituting this product for f(€,7,¢,...) in the left-hand side of (50). 
The average value of this function for any state is the probability, Pope... say, 
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of €,7, ¢,... having the values a, b, c,... respectively for that state. Thus if 
the state corresponds to the normalized ket vector |x), we get from our general 
assumption for physical interpretation 


Fabes: = (a d¢a0nbOce pret |x). (51) 
Pape... 1S zero unless each of the numbers a, b, c,... is an eigenvalue of 
the corresponding observable. If any of the numbers a, b, c,... is an eigenvalue 


in a range of eigenvalues of the corresponding observable, Pyy.... will usually again 
be zero, but in this case we ought to replace the requirement that this observable 
shall have exactly one value by the requirement that it shall have a value lying 
within a small range, which involves replacing one of the 6 factors in (51) by a factor 
like the y(€) of equation (46). On carrying out such a replacement for each of 
the observables €, 7, ¢,..., whose corresponding numerical value a, b, c,... lies in 
a range of eigenvalues, we shall get a probability which does not in general vanish. 

If certain observables commute, there exist states for which they all have 
particular values, in sense explained on p. 39, namely the simultaneous eigenstates. 
Thus one can give a meaning to several commuting observables having values 
at the same time. Further, we see from (51) that for any state one can give 
a meaning to the probability of particular results being obtained for simultaneous 
measurements of several commuting observables. This conclusion is an important 
new development. In general one cannot make an observation on a system in 
a definite state without disturbing that state and spoiling it for the purposes of 
a second observation. One cannot then give any meaning to the two observations 
being made simultaneously. The above conclusion tells us, though, that in 
the special case when the two observables commute, the observations are to be 
considered as non-interfering or compatible, in such a way that one can give 
a meaning to the two observations being made simultaneously and can discuss 
the probability of any particular results being obtained. The two observations may, 
in fact, be considered as a single observation of a more complicated type, the result 
of which is expressible by two numbers instead of a single number. From the point 
of view of general theory, any two or more commuting observables may be counted 
as a single observable, the result of a measurement of which consists of two or 
more numbers. The states for which this measurement is certain to lead to one 
particular result are the simultaneous eigenstates. 


compatible 
observations 


representative 


complete 
bras 


basic bras 


set 


of 


Il. REPRESENTATIONS 


14. Basic vectors 

IN the preceding chapters we set up an algebraic scheme involving certain abstract 
quantities of three kinds, namely bra vectors, ket vectors and linear operators, 
and we expressed some of the fundamental laws of quantum mechanics in terms 
of them. It would be possible to continue to develop the theory in terms of 
these abstract quantities and to use them for applications to particular problems. 
However, for some purposes it is more convenient to replace the abstract quantities 
by sets of numbers with analogous mathematical properties and to work in terms of 
these sets of numbers. The procedure is similar to using coordinates in geometry, 
and has the advantage of giving one greater mathematical power for the solving 
of particular problems. 

The way in which the abstract quantities are to be replaced by numbers is 
not unique, there being many possible ways corresponding to the many systems of 
coordinates one can have in geometry. Each of these ways is called a representation 
and the set of numbers that replace an abstract quantity is called the representative 
of that abstract quantity in the representation. Thus the representative of 
an abstract quantity corresponds to the coordinates of a geometrical object. 
When one has a particular problem to work out in quantum mechanics, one can 
minimize the labour by using a representation in which the representatives of 
the more important abstract quantities occurring in that problem are as simple 
as possible. 

To set up a representation in a general way, we take a complete set of 
bra vectors, i.e. a set such that any bra can be expressed linearly in terms of 
them (as a sum or an integral or possibly an integral plus a sum). These bras 
we call the basic bras of the representation. They are sufficient, as we shall see, 
to fix the representation completely. 

Take any ket |a) and form its scalar product with each of the basic bras. 
The numbers so obtained constitute the representative of |a). They are sufficient 
to determine the ket |a) completely, since if there is a second ket, |a,) say, for which 
these numbers are the same, the difference |a) — |a,) will have its scalar product 
with any basic bra vanishing, and hence its scalar product with any bra whatever 
will vanish and |a) — |a;) itself will vanish. 
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We may suppose the basic bras to be labelled by one or more parameters, 
1, A2,--+;Ay, each of which may take on certain numerical values. The basic 
bras will then be written (A;A2...A,| and the representative of |a) will be written 
(AjAg...Au{a). This representative will now consist of a set of numbers, one for 
each set of values that A,,A2,...,A, may have in their respective domains. 
Such a set of numbers just forms a function of the variables j,A9,...,Av- 
Thus the representative of a ket may be looked upon either as a set of numbers or 
as a function of the variables used to label the basic bras. 

If the number of independent states of our dynamical system is finite, equal to 
n say, it is sufficient to take n basic bras, which may be labelled by a single 
parameter \ taking on the values 1,2,3,...,n. The representative of any ket |a) 
now consists of the set of nm numbers (1|a), (2|a), (3]a),...,(n|a), which are 
precisely the coordinates of the vector |a) referred to a system of coordinates in 
the usual way. The idea of the representative of a ket vector is just a generalization 
of the idea of the coordinates of an ordinary vector and reduces to the latter when 
the number of dimensions of the space of the ket vectors is finite. 

In a general representation there is no need for the basic bras to be all 
independent. In most representations used in practice, however, they are all 
independent, and also satisfy the more stringent condition that any two of them 
are orthogonal. The representation is then called an orthogonal representation. 

Take an orthogonal representation with basic bras (A ,A2...2,|, labelled by 
parameters \1, A2,...,Au, whose domains are all real. Take a ket |a) and form 
its representative (A,;\2...A,|a). Now form the numbers \, (A,A2...A,|a@) and 
consider them as the representative of a new ket |b). This is permissible since 
the numbers forming the representative of a ket are independent, on account of 
the basic bras being independent. The ket |b) is defined by the equation 

(Apr... Aw |b) = Ar ArA2..- Au |a). 
The ket |b) is evidently a linear function of the ket |a), so it may be considered 
as the result of a linear operator applied to |a). Calling this linear operator Ly, 


we have |b) = Ly |a) 
and hence (AyAg.-- Aul| £1 Ja) = Ad AqAg.- Au | a). 
This equation holds for any ket |a), so we get 
Onno Ay Ey St Aine ec Ag (1) 


Equation (1) may be looked upon as the definition of the linear operator Ly. 
It shows that each basic bra is an eigenbra of Ly, the value of the parameter A, 
being the eigenvalue belonging to tt. 

From the condition that the basic bras are orthogonal we can deduce that Ly 
is real and is an observable. Let Aj, A5,...,A/, and AY, Ag,..., A" be two sets of 
values for the parameters \1,2,...,;Au. We have, putting \’’s for the \’s in (1) 
and multiplying on the right by |AYAS... 2”), the conjugate imaginary of the basic 
DEACON TNS cg corey [ye pres da Lay Aa Nog eccc ks AG cecal (An Anon teak, ps 
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Interchanging \’’s and \"’s, 
CNG ie eek [Lag AG Ags ba AON Ap eA IA Agee es 

On account of the basic bras being orthogonal, the right-hand sides here vanish 
unless X)’ = \’ for all r from 1 to u, in which case the right-hand sides are equal, 
and they are also real, \, being real. Thus, whether the \”’s are equal to the \’’s 
or not, {Ni Neca a |NG Norse ee SS Ap ol Nala | Koen hoe A) 

= (MN. MEG Aa.) 
from equation (4) of §8. Since the (AjAS...X/,|’s form a complete set of 
bras and the |AYAS...A")’s form a complete set of kets, we can infer that 
L, = L,. The further condition required for L; to be an observable, namely that 
its eigenstates shall form a complete set, is obviously satisfied since it has as 
eigenbras the basic bras, which form a complete set. 

We can similarly introduce linear operators Lo, L3,..., DL, by multiplying 
(AyAq...Au | a) by the factors A2, A3,..., Au in turn and considering the resulting 
sets of numbers as representatives of kets. Each of these L’s can be shown in 
the same way to have the basic bras as eigenbras and to be real and an observable. 
The basic bras are simultaneous eigenbras of all the L’s. Since these simultaneous 
eigenbras form a complete set, it follows from a theorem* of §13 that any two of 
the L’s commute. 

It will now be shown that, if &, &,..., & are any set of commuting 
observables, we can set up an orthogonal representation in which the basic bras 
are simultaneous eigenbras of &, &,..., &,. Let us suppose first that there is 
only one independent simultaneous eigenbra of £1, &2,..., €, belonging to any 
set of eigenvalues €, €5,..., &. Then we may take these simultaneous eigenbras, 
with arbitrary numerical coefficients, as our basic bras. They are all orthogonal 
on account of the orthogonality theorem (any two of them will have at. least 
one eigenvalue different, which is sufficient to make them orthogonal) and there 
are sufficient of them to form a complete set, from a result of 813. They may 
conveniently be labelled by the eigenvalues €), &5,..., € to which they belong, 
so that one of them is written (£65... &/|. 

Passing now to the general case when there are several independent 
simultaneous eigenbras of £), £2,..., & belonging to some sets of eigenvalues, 
we must pick out from all the simultaneous eigenbras belonging to a set of 
eigenvalues €), &,..., &€/ a complete subset, the members of which are all 
orthogonal to one another. (The condition of completeness here means that any 
simultaneous eigenbra belonging to the eigenvalues €;, €5,..., ), can be expressed 
linearly in terms of the members of the subset.) We must do this for each set 
of eigenvalues €{, &,..., €/, and then put all the members of all the subsets 
together and take them as the basic bras of the representation. These bras 
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are all orthogonal, two of them being orthogonal from the orthogonality theorem 
if they belong to different sets of eigenvalues and from the special way in which 
they were chosen if they belong to the same set of eigenvalues, and they form 
altogether a complete set of bras, as any bra can be expressed linearly in terms 
of simultaneous eigenbras and each simultaneous eigenbra can then be expressed 
linearly in terms of the members of a subset. There are infinitely many ways of 
choosing the subsets, and each way provides one orthogonal representation. 

For labelling the basic bras in this general case, we may use the eigenvalues 
£1, &5,--., €, to which they belong, together with certain additional real variables 
Ai, A2,---, Ay Say, Which must be introduced to distinguish basic vectors belonging 
to the same set of eigenvalues from one another. A basic bra is then written 
(€65...E€, AyAg...Ay|. Corresponding to the variables A;, A2,..., A» we can define 
linear operators L1, L2,..., Ly by equations like (1) and can show that these linear 
operators have the basic bras as eigenbras, and that they are real and observables, 
and that they commute with one another and with the €’s. The basic bras 
are now simultaneous eigenbras of all the commuting observables £1, £2,..., 4, 
re De ene oe 

Let us define a complete set of commuting observables to be a set of observables 
which all commute with one another and for which there is only one simultaneous 
eigenstate belonging to any set of eigenvalues. Then the observables £1, €2,..., &u, 
L1, [2,..., L, form a complete set of commuting observables, there being only 
one independent simultaneous eigenbra belonging to the eigenvalues €), &,..., &, 
Ai, A2;-+-; Ay, namely the corresponding basic bra. Similarly the observables 
I, Lo,..., Ly defined by equation (1) and the following work form a complete 
set of commuting observables. With the help of this definition the main results of 
the present section can be concisely formulated thus: 

(i) The basic bras of an orthogonal representation are simultaneous eigenbras 
of a complete set of commuting observables. 

(ii) Given a complete set of commuting observables, we can set up an orthogonal 
representation in which the basic bras are simultaneous eigenbras of 
this complete set. 

(iii) Any set of commuting observables can be made into a complete commuting 
set by adding certain observables to it. 

(iv) A convenient way of labelling the basic bras of an orthogonal representation 
is by means of the eigenvalues of the complete set of commuting observables 
of which the basic bras are simultaneous eigenbras. 

The conjugate imaginaries of the basic bras of a representation we call the basic 
kets of the representation. Thus, if the basic bras are denoted by (\,\2...Aul; 
the basic kets will be denoted by |AyA2...Au). The representative of a bra (b | 
is given by its scalar product with each of the basic kets, i.e. by (b | AyAg...Au)- 
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It may, like the representative of a ket, be looked upon either as a set of numbers 
or as a function of the variables \;, A2,..., Az. We have 
(b | AtAg-«- Au) = ArAg.. Au | 8) 

showing that the representative of a bra is the conjugate complex of 
the representative of the conjugate imaginary ket. In an orthogonal representation, 
where the basic bras are simultaneous eigenbras of a complete set of commuting 
observables, 1, £2,.-.., € say, the basic kets will be simultaneous eigenkets of 
Et; ores “9 oe 

We have not yet considered the lengths of the basic vectors. With an orthogonal 
representation, the natural thing to do is to normalize the basic vectors, rather than 
leave their lengths arbitrary, and so introduce a further stage of simplification into 
the representation. However, it is possible to normalize them only if the parameters 
which label them all take on discrete values. If any of these parameters are 
continuous variables that can take on all values in a range, the basic vectors are 
eigenvectors of some observable belonging to eigenvalues in a range and are of 
infinite length, from the discussion in §10* Some other procedure is then needed 
to fix the numerical factors by which the basic vectors may be multiplied. To get 
a convenient method of handling this question a new mathematical notation is 
required, which will be given in the next section. 


15. The 6 function 

Our work in 810 led us to consider quantities involving a certain kind of infinity. 
To get a precise notation for dealing with these infinities, we introduce a quantity 
d(x) depending on a parameter x satisfying the conditions 


i bo) ae = 1 (2) 
ss d(x) = 0 for x £0 


To get a picture of 6(x), take a function of the real variable « which vanishes 
everywhere except inside a small domain, of length € say, surrounding the origin 
x = 0, and which is so large inside this domain that its integral over this domain is 
unity. The exact shape of the function inside this domain does not matter, provided 
there are no unnecessarily wild variations (for example provided the function is 
always of order «~'). Then in the limit ¢ — 0 this function will go over into 6(z). 

d(x) is not a function of x according to the usual mathematical definition 
of a function, which requires a function to have a definite value for each point 
in its domain, but is something more general, which we may call an ‘improper 
function’ to show up its difference from a function defined by the usual definition. 
Thus 6() is not a quantity which can be generally used in mathematical analysis 


*[See page 33.] 
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like an ordinary function, but its use must be confined to certain simple types of 
expression for which it is obvious that no inconsistency can arise. 
The most important property of 6(a) is exemplified by the following equation, 


[ f(x)8(x) dx = f(0), (3) 


where f(x) is any continuous function of x. We can easily see the validity of 
this equation from the above picture of 6(x). The left-hand side of (3) can depend 
only on the values of f(a) very close to the origin, so that we may replace f(:) 
by its value at the origin, ; . ), without essential error. Equation (3) then follows 
from the first of equations (2). By hs a change of origin in (3), we can deduce 


the formula, [1 f(x)d(a —a) dx = f(a), (4) 


where a is any real number. Thus the process of multiplying a function of x by 
d(a—a) and integrating over all x is equivalent to the process of substituting a for x. 
This general result holds also if the function of x is not a numerical one, but is 
a vector or linear operator depending on 2. 

The range of integration in (3) and (4) need not be from —oo to oo, but may 
be over any domain surrounding the critical point at which the 6 function does 
not vanish. In future the limits of integration will usually be omitted in such 
equations, it being understood that the domain of integration is a suitable one. 

Equations (3) and (4) show that, although an improper function does not 
itself have a well-defined value, when it occurs as a factor in an integrand 
the integral has a well-defined value. In quantum theory, whenever an improper 
function appears, it will be something which is to be used ultimately in 
an integrand. Therefore it should be possible to rewrite the theory in a form 
in which the improper functions appear all through only in integrands. One could 
then eliminate the improper functions altogether. The use of improper functions 
thus does not involve any lack of rigour in the theory, but is merely a convenient 
notation, enabling us to express in a concise form certain relations which we could, 
if necessary, rewrite in a form not involving improper functions, but only in 
a cumbersome way which would tend to obscure the argument. 

An alternative way of defining the 6 function is as the differential coefficient 
e'(x) of the function e() given by e(x) = 0, (x < 0), : 

= (oO) (5) 
We may verify that this is equivalent to the previous definition by substituting 
e'(x) for 6(x) in the left-hand side of (3) and integrating by parts. We find, for g; 
and go two positive numbers, 
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[ioe dz = [Poje(a)] = [roan da 


ne ( "fi(e) de 
= f(0), 


in agreement with (3). The 6 function appears whenever one differentiates 
a discontinuous function. 

There are a number of elementary equations which one can write down about 
0 functions. These equations are essentially rules of manipulation for algebraic 
work involving 6 functions. The meaning of any of these equations is that its two 
sides give equivalent results as factors in an integrand. 

Examples of such equations are 


b(—2# — ); (6) 

a6 (a (7) 

d(a ae (a > 0), (8) 

6(x? — a?) = ta‘ {d(x — a) + 6(x +a)} (a> 0), (9) 

[ola 2) ar ate —0) = 510-8), (10) 
f(x)d(a — a) = f(a)d(a — a). (11) 


Equation (6), which merely states that 6(x) is an even function of its variable « is 
trivial. To verify (7) take any continuous function of x, f(x). Then 


[ t@x6t2) d=) 


from (3). Thus d(x) as a factor in an integrand is equivalent to zero, which is just 
the meaning of (7). (8) and (9) may be verified by similar elementary arguments. 
To verify (10) take any continuous function of a, f(a). Then 


[te ae (a—2x) dx so) = foo) ar f Haya d(a — x) 


= fe») dx f(x) 


= f fla) da 5(a-8). 


Thus the two sides of (10) are equivalent as factors in an integrand with a as 
variable of integration. It may be shown in the same way that they are 
equivalent also as factors in an integrand with b as variable of integration, so that 
equation (10) is justified from either of these points of view. Equation (11) is also 
easily justified, with the help of (4), from two points of view. 

Equation (10) would be given by an application of (4) with f(a) = 6(a# — 0). 
We have here an illustration of the fact that we may often use an improper function 
as though it were an ordinary continuous function, without getting a wrong result. 
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Equation (7) shows that, whenever one divides both sides of an equation by 
a variable x which can take on the value zero, one should add on to one side 
an arbitrary multiple of d(x), i.e. from an equation 


ASB (12) 
one cannot infer A/x = B/x, 
but only A/x = B/x + cé(x), (13) 


where c is unknown. 

As an illustration of work with the 6 function, we may consider 
the differentiation of log, x. The usual formula 
< log, 2 = - (14) 
requires examination for the neighbourhood of «=0. In order to make 
the reciprocal function 1/2 well defined in the neighbourhood of x = 0 (in the sense 
of an improper function) we must impose on it an extra condition, such as that 
its integral from —e to € vanishes. With this extra condition, the integral of 
the right-hand side of (14) from —e to € vanishes, while that of the left-hand side 
of (14) equals log.(—1), so that (14) is not a correct equation. To correct it, 
we must remember that, taking principal values, log, has' an imaginary term 
im for negative values of x. As x passes through the value zero this' imaginary 
term vanishes discontinuously. The differentiation of thist imaginary term gives 
us the result —izd(x), so that (14) should read 

d 


1 
“ jog, 2 = — —ind(a). 15 
Tz ket = 5 imd(x) (15) 


The particular combination of reciprocal function and 6 function appearing in (15) 
plays an important part in the quantum theory of collision processes (see §50). 


16. Properties of the basic vectors 
Using the notation of the 6 function, we can proceed with the theory of 
representations. Let us suppose first that we have a single observable € forming 
by itself a complete commuting set, the condition for this being that there 
is only one eigenstate of € belonging to any eigenvalue €’, and let us set up 
an orthogonal representation in which the basic vectors are eigenvectors of € and 
are written (€’|, |é’). 

In the case when the eigenvalues of € are discrete, we can normalize the basic 
vectors, and we then have (€' |é")=0 (€'46"), 


eyed 
These equations can be combined into the single equation 
(E! | €") = deen, (16) 


t[‘pure’ omitted. 
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where the symbol 0 with two suffixes, which we shall often use in the future, 


has the meaning brs =0 whenr 4s 
(17) 
=1 whenr=s. 

In the case when the eigenvalues of € are continuous we cannot normalize 
the basic vectors. If we now consider the quantity (&’ | €”) with €' fixed and 
€” varying, we see from the work connected with expression (29) of §10 that 
this quantity vanishes for €” 4 €’ and that its integral over a range of €” extending 
through the value €’ is finite, equal to c say. Thus 

(é! |”) = co(e’ — £". 
From (30) of §10, c is a positive number. It may vary with €, so we should write 
it c(é’) or c for brevity, and thus we have 

(e |e") = c5(e -£". (18) 
Alternatively, we have (Eo SCO ae"). (19) 
where c” is short for c(é”), the right-hand sides of (18) and (19) being equal on 
account of (11). 

Let us pass to another representation whose basic vectors are eigenvectors 
of €, the new basic vectors being numerical multiples of the previous ones. 
Calling the new basic vectors (€"|, |€*), with the additional label * to distinguish 
them from the previous ones, we have 

("HR CE], 16") = WE), 
where k’ is short for k(€’) and is a number depending on €’. We get 
(e"* | é/*) = kk" (e! | é""" = kik 6(€' _ cr) 
with the help of (18). This may be written 
(e* | é"*\ se kik'c!6(€ _ é""" 

from (11). By choosing k’ so that its modulus is c’~4, which is possible since c’ is 
positive, we arrange to have (e* | €*) = 6(€ — €"). (20) 
The lengths of the new basic vectors are now fixed so as to make the representation 
as simple as possible. The way these lengths were fixed is in some respects 
analogous to the normalizing of the basic vectors in the case of discrete €, 
equation (20) being of the form of (16) with the 6 function 6(&’ — €”) replacing 
the 6 symbol dee of equation (16). We shall continue to work with the new 
representation and shall drop the * labels in it to save writing. Thus (20) will now 
be written ee") ole 6"), (21) 

We can develop the theory on closely parallel lines for the discrete and 
continuous cases. For the discrete case we have, using (16), 


Sole) E16") = SIE) Serer = 16, 
e é 


the sum being taken over all eigenvalues. This equation holds for any basic ket 
|€”) and hence, since the basic kets form a complete set, 
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MEE ISt: (22) 
a 


This is a useful equation expressing an important property of the basic vectors, 
namely, if |&') is multiplied on the right by (&'| the resulting linear operator, 
summed for all &', equals the unit operator. Equations (16) and (22) give 
the fundamental properties of the basic vectors for the discrete case. 

Similarly, for the continuous case we have, using (21), 


/ le’) ae’ (e |e") = / le’) de’ ae’ — &") = |e”) (23) 


from (4) applied with a ket vector for f(x), the range of integration being the range 
of eigenvalues. This holds for any basic ket |”) and hence 


/ ie’) de’ (| =1. (24) 


This is of the same form as (22) with an integral replacing the sum. 
Equations (21) and (24) give the fundamental properties of the basic vectors for 
the continuous case. 

Equations (22) and (24) enable one to expand any bra or ket in terms of 
the basic vectors. For example, we get for the ket |P) in the discrete case, 
by multiplying (22) on the right by |P), 


IP) = SOE) ETP), (25) 
7 


which gives |P) expanded in terms of the |&’)’s and shows that the coefficients in 
the expansion are (¢’ | P), which are just the numbers forming the representative 
of |P). Similarly, in the continuous case, 


IP) = / le’) de’ (€ | P) (26) 


giving |P) as an integral over the |€’)’s, with the coefficient in the integrand again 
just the representative (¢’ | P) of |P). The conjugate imaginary equations to (25) 
and (26) would give the bra vector (P| expanded in terms of the basic bras. 

Our present mathematical methods enable us in the continuous case to expand 
any ket as an integral of eigenkets of €. If we do not use the 6 function notation, 
the expansion of a general ket will consist of an integral plus a sum, as in 
equation (25) of §10, but the 6 function enables us to replace the sum by an integral 
in which the integrand consists of terms each containing a 0 function as a factor. 
For example, the eigenket |€”) may be replaced by an integral of eigenkets, as is 
shown by the second of equations (23). 

If (Q| is any bra and |P) any ket we get, by further applications of (22) and (24), 


Q1P)= SQ )epe | P) (27) 
é/ 
icdneiceam opr = / (Q | €) de! (e! | P) (28) 
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for continuous ¢. These equations express the scalar product of (Q| and |P) in 
terms of their representatives (Q | €’) and (€’ | P). Equation (27) is just the usual 
formula for the scalar product of two vectors in terms of the coordinates of 
the vectors, and (28) is the natural modification of this formula for the case of 
continuous &', with an integral instead of a sum. 

The generalization of the foregoing work to the case when € has both discrete 
and continuous eigenvalues is quite straightforward. Using €" and €* to denote 
discrete eigenvalues and €’ and €” to denote continuous eigenvalues, we have the set 
of equations (é" |) = dere, (EE) =0, (|e) = 4 -&”) (29) 
as the generalization of (16) or (21). These equations express that the basic vectors 
are all orthogonal, that those belonging to discrete eigenvalues are normalized and 
those belonging to continuous eigenvalues have their lengths fixed by the same rule 
as led to (20). From (29) we can derive, as the generalization of (22) or (24), 

Siler e+ fe) ae! el =. (30) 
ér 
the range of integration being the range of continuous eigenvalues. With the help 
of (30), we get immediately 


IP) = Tey (| P) + : le’) dé! (| P) (31) 
as the generalization of (05) or (26), and 
@|P)=TLelneEe HP) + (Qe) dé’ (e | P) (32) 
. 


as the generalization of (27) or (28). 

Let us now pass to the general case when we have several commuting 
observables £,, &2,..., &, forming a complete commuting set and set up 
an orthogonal representation in which the basic vectors are simultaneous 
eigenvectors of all of them, and are written (€,...€/|, |€,...€/,). Let us 
suppose 1, ,...,&€ (vu <u) have discrete eigenvalues and )41,..., € have 
continuous eigenvalues. 

Consider the quantity (€,--- 6.6041 ---€ | &1-»- Go€041 + -&a)- 

From the orthogonality theorem, it must vanish unless each €” = €! for 
s=v+1,...,u. By extending the work connected with expression (29) of §10 
to simultaneous eigenvectors of several commuting observables and extending 
also the axiom (30), we find that the (uw — v)-fold integral of this quantity 
with respect to each €” over a range extending through the value € is a finite 
positive number. Calling this number c’, the ’ denoting that it is a function of 
&j,.--, €, Euay,---, &, we can express our results by the equation 

eruee aa ekG, pe SCG ee) ee baer) O a). 193) 
with one 6 factor on the right-hand side for each value of s from v+1 to u. We now 
change the lengths of our basic vectors so as to make c’ unity, by a procedure similar 
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to that which led to (20). By a further use of the orthogonality theorem, we get 
finally (€,...€, | Gf. £0) = depen. SepenO(Evaa — Ein) OE, E),——(34) 
with a two-suffix 6 symbol on the right-hand side for each € with discrete 
eigenvalues and a 6 function for each € with continuous eigenvalues. ‘This is 
the generalization of (16) or (21) to the case when there are several commuting 
observables in the complete set. 

From (34) we can derive, as the generalization of (22) or (24) 


Se fe flee 8) der dA G1, (35) 
1-8 
the integral being a (u — v)-fold one over all the €’s with continuous eigenvalues 
and the summation being over all the €’’s with discrete eigenvalues. Equations (34) 
and (35) give the fundamental properties of the basic vectors in the present case. 
From (35) we can immediately write down the generalization of (25) or (26) and 
of (27) or (28). 

The case we have just considered can be further generalized by allowing some 
of the €’s to have both discrete and continuous eigenvalues. The modifications 
required in the equations are quite straightforward, but will not be given here as 
they are rather cumbersome to write down in general form. 

There are some problems in which it is convenient not to make the c’ of 
equation (33) equal unity, but to make it equal to some definite function of 
the €’s ey rane this function of the £’s p’~! we then have, instead of (34)' 


(Er cise, (G1 +76) SP > Oe Ot es gen OE, = C09) cE Es (36) 
and astendl of = we get 


ce PB Bb dbs dE, B81 (37) 
p’ is called ile ene function of the representation, p'd&,,,...d&/, being 
the ‘weight’ attached to a small volume element of the space of the variables 
Are eee ae 
The representations we considered previously all had the weight function unity. 
The introduction of a weight function not unity is entirely a matter of convenience 
and does not add anything to the mathematical power of the representation. 
The basic bras (€;...€/*| of a representation with the weight function p’ are 
connected with the basic bras (€...€| of the corresponding representation with 
the weight function unity by (er a (erg (38) 


as is easily verified. An example of a useful representation with non-unit weight 
function occurs when one has two €’s which are the polar and azimuthal angles 
6 and ¢ giving a direction in three-dimensional space and one takes p’ = sin 6’ 
One then has the element of solid angle sin 6’d0'd¢@’ occurring in (37). 


‘!The final subscript u is a correction in the 4th edition revised from the v in the third_] 


weight function 


representative 


matrix 
element of a matrix 


diagonal element 


unit matrix 


Hermitian matrix 
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17. The representation of linear operators 
In §14 we saw how to represent ket and bra vectors by sets of numbers. We now 
have to do the same for linear operators, in order to have a complete scheme for 
representing all our abstract quantities by sets of numbers. The same basic vectors 
that we had in §14 can be used again for this purpose. 

Let us suppose the basic vectors are simultaneous eigenvectors of a complete 


set of commuting observables €;, &2,..., €. If @ is any linear operator, 
we take a general basic bra (€,...€/,| and a general basic ket |€/...€/) and form 
the numbers Cee cere | een ae (39) 


These numbers are sufficient to determine a completely, since in the first 
place they determine the ket a |€/...&”) (as they provide the representative of 
this ket), and the value of this ket for all the basic kets |€/...€”) determines a. 
The numbers (39) are called the representative of the linear operator a or of 
the dynamical variable a. They are more complicated than the representative of 
a ket or bra vector in that they involve the parameters that label two basic vectors 
instead of one. 

Let us examine the form of these numbers in simple cases. ‘Take first the case 
when there is only one €, forming a complete commuting set by itself, and suppose 
that it has discrete eigenvalues €’ The representative of a is then the discrete set 
of numbers (| a |€”). If one had to write out these numbers explicitly, the natural 
way of arranging them would be as a two-dimensional array, thus: 

(Sale) (E]alg?) (Et a |&°) 
(El alé) (El alg) (alg) . 
Ce TCrE Ss CERES) CEMONES ocee 43 (40) 


where €1, €2 €%... are all the eigenvalues of €. Such an array is called a matriz 
and the numbers are called the elements of the matrix. We make the convention 
that the elements must always be arranged so that those in the same row refer to 
the same basic bra vector and those in the same column refer to the same basic 
ket vector. 

An element (&"| a |é’) referring to two basic vectors with the same label is called 
a diagonal element of the matrix, as all such elements lie on a diagonal. If we put 
a equal to unity, we have from (16) all the diagonal elements equal to unity and 
all the other elements equal to zero. The matrix is then called the unit matric. 

If a is real, we have (E] aE") = (E"| aE’. (41) 
The effect of these conditions on the matrix (40) is to make the diagonal elements 
all real and each of the other elements equal the conjugate complex of its mirror 
reflection in the diagonal. The matrix is then called a Hermitian matric. 

If we put a equal to €, we get for a general element of the matrix 


(EE |E") = EE | EY) = Seren. (42) 
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Thus all the elements not on the diagonal are zero. The matrix is then called 
a diagonal matrix. Its diagonal elements are just equal to the eigenvalues of €. 
More generally, if we put a equal to f(&), a function of €, we get 
(6) FQ) IE) = FE )derer, (43) 
and the matrix is again a diagonal matrix. 
Let us determine the representative of a product a( of two linear operators a 
and @ in terms of the representatives of the factors. From equation (22) with ¢” 


substituted for €’ we obtain (€'|aB |é") = (E'a S- Rage Cad Oa aa 
en 
= So ela le”) (E"| B12), (44) 
em 
which gives us the required result. Equation (44) shows that the matrix formed by 
the elements (€’| a6 |€”) equals the product of the matrices formed by the elements 
(€| a |€é") and ("| 3 \€”) respectively, according to the usual mathematical rule for 
multiplying matrices. This rule gives for the element in the rth row and sth column 
of the product matrix the sum of the product of each element in the rth row 
of the first factor matrix with the corresponding element in the sth column of 
the second factor matrix. The multiplication of matrices is non-commutative, 
like the multiplication of linear operators. 
We can summarize our results for the case when there is only one € and it has 
discrete eigenvalues as follows: 
(i) Any linear operator is represented by a matrix. 


(ii) The unit operator is represented by the unit matriz. 


) 
(iii) A real linear operator is represented by a Hermitian matriz. 
) 
) 


(iv) € and functions of € are represented by diagonal matrices. 
(v) The matrix representing the product of two linear operators is the product of 
the matrices representing the two factors. 

Let us now consider the case when there is only one € and it has continuous 
eigenvalues. The representative of a is now (£’| a|&€”), a function of two variables 
€’ and €" which can vary continuously. It is convenient to call such a function 
a ‘matrix, using this word in a generalized sense, in order that we may be able 
to use the same terminology for the discrete and continuous cases. One of these 
generalized matrices cannot, of course, be written out as a two-dimensional array 
like an ordinary matrix, since the number of its rows and columns is an infinity 
equal to the number of points on a line, and the number of its elements is an infinity 
equal to the number of points in an area. 

We arrange our definitions concerning these generalized matrices so that 
the rules (i)-(v) which we had above for the discrete case hold also for 
the continuous case. The unit operator is represented by 6(&’ — €”) and 
the generalized matrix formed by these elements we define to be the unit matriz. 


diagonal matrix 


matrix 


unit matrix 


Hermitian matrix 


diagonal matrix 


diagonal matrix 
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We still have equation (41) as the condition for a to be real and we define 
the generalized matrix formed by the elements (&'| a |&”) to be Hermitian when 
it satisfies this condition. € is represented by (€’| € |€”) = €'0(E' — €") (45) 


and f(€) by (C1 AFQIE) = FEE -—E"), (46) 
and the generalized matrices formed by these elements we define to be diagonal 
matrices. From (11), we could equally well have €” and f(&") as the coefficients 
of 6(€’ — €) on the right-hand sides of (45) and (46) respectively. Corresponding 
to equation (44) we now have, from (24) 


("| a6 |&") = iy (E"] 6") do" (6"| BE") (47) 


with an integral instead of a sum, and we define the generalized matrix formed by 
the elements on the right-hand side here to be the product of the matrices formed 
by (&'|a@|é”) and (&'| 8 |€"). With these definitions we secure complete parallelism 
between the discrete and continuous cases and we have the rules (i)—(v) holding 
for both. 

The question arises how a general diagonal matrix is to be defined in 
the continuous case, as so far we have only defined the right-hand sides of (45) 
and (46) to be examples of diagonal matrices. One might be inclined to define 
as diagonal any matrix whose (£', €”) elements all vanish except when €’ differs 
infinitely little from €", but this would not be satisfactory, because an important 
property of diagonal matrices in the discrete case is that they always commute 
with one another and we want this property to hold also in the continuous case. 
In order that the matrix formed by the elements (£'| w |€”) in the continuous case 
may commute with that formed by the elements on the right-hand side of (45) 
we must have, using the multiplication rule (47), 


fe W |e” de eo (E" = £") — [ese = eae Cull W lens 
With the help of formula (4), this reduces to (€"| w|é”) €” = €' (E'| w |é”) (48) 


or (= 6) Clwle) =0: 
This gives, according to the rule by which (13) follows from (12), 

("| w |E") = c6(E — &") 
where c’ is a number that may depend on €’ Thus (&'|w |é”) is of the form of 
the right-hand side of (46). For this reason we define only matrices whose elements 
are of the form of the right-hand side of (46) to be diagonal matrices. It is easily 
verified that these matrices all commute with one another. One can form other 
matrices whose (£', €’) elements all vanish when €’ differs appreciably from €” and 
have a different form of singularity when ¢’ equals €” [we shall later introduce 
the derivative 6'(x) of the 6 function and 0/(é’ — €") will then be an example, 
see §22 equation (19)], but these other matrices are not diagonal according to 
the definition. 
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Let us now pass on to the case when there is only one € and it has both 
discrete and continuous eigenvalues. Using €", €° to denote discrete eigenvalues 
and €', €” to denote continuous eigenvalues, we now have the representative of a 
consisting of four kinds of quantities: (€"| a |&*), (E"|a|é’), (E’,alE"), (Ela |é"). 
These quantities can all be put together and considered to form a more general 
kind of matrix having some discrete rows and columns and also a continuous range 
of rows and columns. We define unit matrix, Hermitian matrix, diagonal matrix, 
and the product of two matrices also for this more general kind of matrix so as 
to make the rules (i)—(v) still hold. The details are a straightforward generalization 
of what has gone before and need not be given explicitly. 

Let us now go back to the general case of several €’s, €,, &,..., &u- 
The representative of a, expression (39), may still be looked upon as forming 
a matrix, with rows corresponding to different values of €),..., €/, and columns 
corresponding to different values of €/,..., €”. Unless all the €’s have discrete 
eigenvalues, this matrix will be of the generalized kind with continuous ranges of 
rows and columns. We again arrange our definitions so that the rules (i)—(v) hold, 
with rule (iv) generalized to: 

(iv’) Each &m (m = 1,2,...,u) and any function of them is represented by 
a diagonal matrix. 

A diagonal matrix is now defined as one whose general element 
(€...€,|w |€7...€0) is of the form 

(Sy Sul@ oy. Sa) = Cberen -  Oepev (Ena — Soya)» (En — Su) (49) 
in the case when &,...,€ have discrete eigenvalues and £,41,...,&, have 
continuous eigenvalues, c’ being any function of the €’s. This definition is 
the generalization of what we had with one € and makes diagonal matrices always 
commute with one another. The other definitions are straightforward and need 
not be given explicitly. 

We now have a linear operator always represented by a matrix. The sum 
of two linear operators is represented by the sum of the matrices representing 
the operators and this, together with rule (v), means that the matrices are subject 
to the same algebraic relations as the linear operators. If any algebraic equation 
holds between certain linear operators, the same equation must hold between 
the matrices representing those operators. 

The scheme of matrices can be extended to bring in the representatives 
of ket and bra vectors. The matrices representing linear operators are all 
square matrices with the same number of rows and columns, and with, in fact, 
a one-one correspondence between their rows and columns. We may look upon 
the representative of a ket |P) as a matrix with a single column by setting all 
the numbers (¢...€/ | P) which form this representative one below the other. 
The number of rows in this matrix will be the same as the number of rows or 
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columns in the square matrices representing linear operators. Such a single-column 
matrix can be multiplied on the left by a square matrix (€,...€,a|&/...€”") 
representing a linear operator, by a rule similar to that for the multiplication of 
two square matrices. The product is another single-column matrix with elements 


ce Sf PG Bele .20 dehy. dee fe | P) 
ere, 

From (35) this is just equal to (€, ...€,|a|P), the representative of a|P). Similarly 
we may look upon the representative of a bra (Q| as a matrix with a single row by 
setting all the numbers (Q | €,...&) side by side. Such a single-row matrix may be 
multiplied on the right by a square matrix (¢|...&/ | a|€7...€"), the product being 
another single-row matrix, which is just the representative of (Q| a. The single-row 
matrix representing (Q| may be multiplied on the right by the single-column 
matrix representing |P), the product being a matrix with just a single element, 
which is equal to (Q | P). Finally, the single-row matrix representing (Q| may be 
multiplied on the left by the single-column matrix representing |P), the product 
being a square matrix, which is just the representative of |P) (Q|. In this way 
all our abstract symbols, linear operators, bra vectors and ket vectors, can be 
represented by matrices, which are subject to the same algebraic relations as 
the abstract symbols themselves. 


18. Probability amplitudes 
Representations are of great importance in the physical interpretation of quantum 
mechanics as they provide a convenient method for obtaining the probabilities 
of observables having given values. In 812 we obtained the probability of 
an observable having any specified value for a given state and in §13 we generalized 
this result and obtained the probability of a set of commuting observables 
simultaneously having specified values for a given state. Let us now apply this 
result to a complete set of commuting observables, say the set of €’s which we have 
been dealing with already. According to formula (51) of §13, the probability of each 
£, having the value €/ for the state corresponding to the normalized ket vector |x) is 
Peg, = (| bee, be .¢ ++ 9e,e4, |). (50) 
If the €’s all have discrete eigenvalues, we can use (35) with v = u and no integrals, 


and get Pee = D> (tl dee; dee, -- See, [Ef + El) (Ef Ei |) 


Becell 
= So (| bever Sere, ... Sener |G. 60) (Ef... €4 |x) 
el. 


= glee Ae nee) 
= ehoace', fan) |e (51) 
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We thus get the simple result that the probability of the €’s having the values €' is 
just the square of the modulus of the appropriate coordinate of the normalized ket 
vector corresponding to the state concerned. 

If the €’s do not all have discrete eigenvalues, but if, say, &1,...&, have discrete 
eigenvalues and £,4,...&, have continuous eigenvalues, then to get something 
physically significant we must obtain the probability of each €, (r = 1,...,v) 
having a specified value €/, and each €, (s = v+1,...,u) lying in a specified small 
range €/ to €, + dé. For this purpose we must replace each factor d¢.e, in (50) by 
a factor y;, which is that function of the observable €, which is equal to unity for 
€, within the range €/ to €4 + d& and zero otherwise. Proceeding as before with 
the help of (35), we obtain for this probability 

Pace ayes = (Gi ceb 2) | de tee nde (52) 
Thus in every case the probability distribution of values for the €’s is given by 
the square of the modulus of the representative of the normalized ket vector 
corresponding to the state concerned. 

The numbers which form the representative of a normalized ket (or bra) may 
for this reason be called probability amplitudes. The square of the modulus of 
a probability amplitude is an ordinary probability, or a probability per unit range 
for those variables that have continuous ranges of values. 

We may be interested in a state whose corresponding ket |x) cannot be 
normalized. This occurs, for example, if the state is an eigenstate of some 
observable belonging to an eigenvalue lying in a range of eigenvalues. The formula 
(51) or (52) can then still be used to give the relative probability of the €’s having 
specified values or having values lying in specified small ranges, i.e. it will give 
correctly the ratios of the probabilities for different €’’s. The numbers (£/... €/, |x) 
may then be called relative probability amplitudes. 

The representation for which the above results hold is characterized by the basic 
vectors being simultaneous eigenvectors of all the €’s. It may also be characterized 
by the requirement that each of the €’s shall be represented by a diagonal matrix, 
this condition being easily seen to be equivalent to the previous one. The latter 
characterization is usually the more convenient one. For brevity, we shall formulate 
it as each of the €’s ‘being diagonal in the representation’. 

Provided the €’s form a complete set of commuting observables, 
the representation is completely determined by the characterization, apart from 
arbitrary phase factors in the basic vectors. Each basic bra (&,...&/| may 
be multiplied by e’% where 7 is any real function of the variables €/,..., €, 
without changing any of the conditions which the representation has to satisfy, 
i.e. the condition that the €’s are diagonal or that the basic vectors are simultaneous 
eigenvectors of the €’s, and the fundamental properties of the basic vectors (34) 
and (35). With the basic bras changed in this way, the representative (€,...&) | P) 
of a ket |P) gets multiplied by e’%, the representative (Q | €)...€,) of a bra (Q] gets 


probability 
amplitude 


relative probability 


amplitude 


diagonal —in 
representation 


a 
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multiplied by e~*’ and the representative (€/...€/| a |€/...€”) of a linear operator 
a gets multiplied by e’-7". The probabilities or relative probabilities (51) & (52) 
are, of course, unaltered. 

The probabilities that one calculates in practical problems in quantum 
mechanics are nearly always obtained from the squares of the moduli of probability 
amplitudes or relative probability amplitudes. Even when one is interested 
only in the probability of an incomplete set of commuting observables having 
specified values, it is usually necessary first to make the set a complete 
one by the introduction of some extra commuting observables and to obtain 
the probability of the complete set having specified values (as the square of 
the modulus of a probability amplitude), and then to sum or integrate over all 
possible values of the extra observables. A more direct application of formula (51) 
of §13 is usually not practicable. 

To introduce a representation in practice 

(i) We look for observables which we would like to have diagonal, 
either because we are interested in their probabilities or for reasons of 
mathematical simplicity; 

(ii) We must see that they all commute—a necessary condition since diagonal 
matrices always commute; 

(iii) We then see that they form a complete commuting set, and if not we add 
some more commuting observables to them to make them into a complete 
commuting set; 

(iv) We set up an orthogonal representation with this complete commuting 
set diagonal. 

The representation is then completely determined except for the arbitrary 
phase factors. For most purposes the arbitrary phase factors are unimportant and 
trivial, so that we may count the representation as being completely determined 
by the observables that are diagonal in it. This fact is already implied in our 
notation, since the only indication in a representative of the representation to 
which it belongs are the letters denoting the observables that are diagonal. 

It may be that we are interested in two representations for the same 
dynamical system. Suppose that in one of them the complete set of commuting 
observables €,,...,€, are diagonal and the basic bras are (€;...€/,| and in the other 
the complete set of commuting observables 7),...,7, are diagonal and the basic 
bras are (7, ...7/,,|. A ket |P) will now have the two representatives (€...€/ | P) 
and (n...7,, | P). If &1,...,€ have discrete eigenvalues and £)41,...,&, have 
continuous eigenvalues and if 7,,...,7, have discrete eigenvalues and 7.41,---; w 
have continuous eigenvalues, we get from (35) 


(nf... |P) = Se fev fh ml 80) dé, ... dt, (@...€,|P), 


Eh 8h (53) 
and interchanging €’s and 7’s 
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(GIP) = fe fe Silt) dteas - Ale th--tel P 

(54) 

These are the Haro aton equations which give one representative of |P) in 

terms of the other. They show that either representative is expressible linearly in 
terms of the other, with the quantities 

Wes Ma (Staeteghie AEr ese (Mey) (55) 


as coefficients. These quantities are called the transformation functions. 


transformation 


Similar equations may be written down to connect the two representatives of function 


a bra vector or of a linear operator. The transformation functions (55) are 
in every case the means which enable one to pass from one representative to 
the other. Each of the transformation functions is the conjugate complex of the 
other, and they satisfy the conditions 


De fo fen meh 6 tao a8 ho 


= Onl nl! - »- Onin (Mert = he) vee st —m,) — (56) 
and the corresponding conditions with €’s and 7’s interchanged, as may be verified 
from (35) and (34) and the corresponding equations for the 7’s. 

Transformation functions are examples of probability amplitudes or relative 
probability amplitudes. Let us take the case when all the €’s and all the 7’s 
have discrete eigenvalues. Then the basic ket |7,...7/,) is normalized, so that 
its representative in the €-representation, (€)...€) | 7,...m7/,), is a probability 
amplitude for each set of values for the €’’s. The state to which these probability 
amplitudes refer, namely the state corresponding to |7,...7/,), is characterized by 
the condition that a simultaneous measurement of 7,..., 7, is certain to lead 
to the results 7/,,..., 7/,. Thus |(€,...€, | 7,...1/,)|? is the probability of the é’s 
having the values €,...€, for the state for which the n’s certainly have the values 


‘on. Si 2 
ea am | (Heme ial ee sm fee ky fee es ee S| 


we have the theorem of ee probability of the €’s having the values €' for reciprocity 
the state for which the n’s certainly have the values 1 is equal to the probability of theorem 


the n’s having the values 1 for the state for which the &’s certainly have the values €'. 

If all the 7’s have discrete eigenvalues and some of the €’s have continuous 
eigenvalues, |(€/...€, | n,... nf,)|° still gives the probability distribution of values 
for the €’s for the state for which the 17’s certainly have the values 17. 
If some of the 7’s have continuous eigenvalues, |7...7/,,) is not normalized and 
\(e,...€, | nf... nf.) |? then gives only the relative probability distribution of values 
for the €’s for the state for which the 7’s certainly have the values 1/ 


19. Theorems about functions of observables 
We shall illustrate the mathematical value of representations by using them to 
prove some theorems. 


diagonal 
respect 
observable 


to 


with 
an 
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THEOREM 1. A linear operator that commutes with an observable € commutes 
also with any function of €. 

The theorem is obviously true when the function is expressible as a power series. 
To prove it generally, let w be the linear operator, so that we have equation 

fw —wE = 0. (57) 

Let us introduce a representation in which € is diagonal. If € by itself does not 
form a complete commuting set of observables, we must make it into a complete 
commuting set by adding certain observables, § say, to it, and then take 
the representation in which € and the §’s are diagonal. (The case when € does 
form a complete commuting set by itself can be looked upon as a special case of 
the preceding one with the number of £6 variables zero.) In this representation 
equation (57) becomes (€'B'| Ew — wé |€"B”) = 0, 


which reduces to €! (¢’6"|w |é”8") — (€/s'|w |é"8") €” = 0. 
In the case when the eigenvalues of € are discrete, this equation shows that all 
the matrix elements (€'5’|w |€”6") of w vanish except those for which €’ = €” 
In the case when the eigenvalues of € are continuous it shows, like equation (48), 
that (€/3'|w |é"8") is of the form —(€""|w |e”B") = cd(€" — &"), 
where c is some function of €’ and the §’’s and 3”’s. In either case we may say 
that the matrix representing w ‘is diagonal with respect to €. If f(€) denotes any 
function of € in accordance with the general theory of §11, which requires f(é”) 
to be defined for €’” any eigenvalue of €, we can deduce in either case 

te?) (€'8"| W |€”B") = (€'6"| W [en BY) f(é") — 0. 
This gives (E'B"| f(E)w — wf (€) |€"8") = 0, 
so that f()w — wf (€) =0 
and the theorem is proved. 

As a special case of the theorem, we have the result that any observable 
that commutes with an observable € also commutes with any function of €. 
This result appears as a physical necessity when we identify, as in §13, the condition 
of commutability of two observables with the condition of compatibility of 
the corresponding observations. Any observation that is compatible with 
the measurement of an observable € must also be compatible with the measurement 
of f(€), since any measurement of € includes in itself a measurement of f(€). 

THEOREM 2. A linear operator that commutes with each of a complete set of 
commuting observables is a function of those observables. 

Let w be the linear operator and &1, &2,..., €& the complete set of commuting 
observables, and set up a representation with these observables diagonal. Since w 
commutes with each of the €’s, the matrix representing it is diagonal with respect to 
each of the €’s, by the argument we had above. This matrix is therefore a diagonal 
matrix and is of the form (49), involving a number c’ which is a function of the €’’s. 
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It thus represents the function of the €’s that c’ is of the €’’s, and hence w equals 
this function of the €’s. 
THEOREM 3. If an observable € and a linear operator g are such that any linear 
operator that commutes with € also commutes with g, then g is a function of €. 
This is the converse of Theorem 1. To prove it, we use the same representation 
with € diagonal as we had for Theorem 1. In the first place, we see that g must 
commute with € itself, and hence the representative of g must be diagonal with 
respect to €, i.e. it must be of the form 
(E'8"| g |E"B") = a(€, 8, B’ been or a(€, 8 B")5(E' — &"), 
according to whether € has discrete or continuous eigenvalues. Now let w be any 
linear operator that commutes with €, so that its representative is of the form 
(€'6"| W Fee ae — b(€', ihe BY Seren or b(E4 oe B' \O(E' = £"), 
By hypothesis w must also commute with g, so that 
(€'B"| gu — wg |€"8") = 0. (58) 
If we suppose for definiteness that the 6’s have discrete eigenvalues, (58) leads, 
with the help of the law of matrix multiplication, to 
S“{alé, ie Br" yo(E, Ge B") _ b(€', 6, Bb” Val€, Ges cua = 0, (59) 
am 
the left-hand side of (58) being equal to the left-hand side of (59) multiplied by 
deren or 0(€' — €"). Equation (59) must hold for all functions b(&/, 64 6”). We can 
deduce that a(t, 8 BAO: for Bosh.8" 
a(é’, foe 6’) =i a(€’, Be cA 
The first of these results shows that the matrix representing g is diagonal and 
the second shows that a(&’, 6’, 6’) is a function of €’ only. We can now infer that g 
is that function of € which a(&‘, 54, 3’) is of €, so the theorem is proved. The proof 
is analogous if some of the 6’s have continuous eigenvalues. 
Theorems 1 and 3 are still valid if we replace the observable € by any set 
of commuting observables £1, £2,..., €-, only formal changes being needed in 
the proofs. 


20. Developments in notation 

The theory of representations that we have developed provides a general system for 
labelling kets and bras. In a representation in which the complete set of commuting 
observables €,..., &, are diagonal any ket |P) will have a representative 
(€...€) | P), or (€ | P) for brevity. This representative is a definite function of 
the variables €’, say w(€'). The function w then determines the ket |P) completely, 
so it may be used to label this ket, to replace the arbitrary label P. In symbols, 
if (é' | i) ar v(é') \ (60) 
we put IP) = |w()). 


standard ket 


wave function 
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We must put |P) equal to |W(€)) and not |~(&’)), since it does not depend on 

a particular set of eigenvalues for the €’s, but only on the form of the function w. 
With f(€) any function of the observables &1,...,&, f(€)|P) will have as 

its representative (| FE) [P) = F(EDY(E). 

Thus according to (60) we put = f(€) |P) = |f (€)w()). 


With the help of the second of equations (60) we now get 
F(E) lH()) = [F(E)H()). (61) 


This is a general result holding for any functions f and w of the €’s, and it shows 

that the vertical line | is not necessary with the new notation for a ket either side 

of (61) may be written simply as f(€)w(&)). Thus the rule for the new notation 

becomes:— if (€’ | P\= v(€') 
(62) 

we put Ee oe we-4 

We may further shorten ~(£)) to w), leaving the variables € understood, if no 

ambiguity arises thereby. 

The ket (€)) may be considered as the product of the linear operator w(&) 
with a ket which is denoted simply by ) without a label. We call the ket ) 
the standard ket. Any ket whatever can be expressed as a function of the €’s 
multiplied into the standard ket. For example, taking |P) in (62) to be the basic 


ket |€”), we find |£") = deer... be,ev (Sai — S41) 9(Eu — &i)) (63) 
in the case when &,...,& have discrete eigenvalues and &,41,...,&, have 


continuous eigenvalues. The standard ket is characterized by the condition that 
its representative (¢’ | ) is unity over the whole domain of the variable €’, as may 
be seen by putting 7 = 1 in (62). 

A further contraction may be made in the notation, namely to leave 
the symbol ) for the standard ket understood. A ket is then written simply 
as W(€), a function of the observables €. A function of the €’s used in this way 
to denote a ket is called a wave function* The system of notation provided by 
wave functions is the one usually used by most authors for calculations in quantum 
mechanics. In using it one should remember that each wave function is understood 
to have the standard ket multiplied into it on the right, which prevents one from 
multiplying the wave function by any operator on the right. Wave functions can 
be multiplied by operators only on the left. This distinguishes them from ordinary 
functions of the €’s, which are operators and can be multiplied by operators on 
either the left or the right. A wave function is just the representative of a ket 
expressed as a function of the observables €, instead of eigenvalues €’ for those 
observables. The square of its modulus gives the probability (or the relative 


*The reason for the name is that in the early days of quantum mechanics all the examples 
of these functions were of the form of waves. The name is not a descriptive one from the point 
of view of the modern general theory. 
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probability, if it is not normalized) of the €’s having specified values, or lying 
in specified small ranges, for the corresponding state. 

The new notation for bras may be developed in the same way as for kets. 
A bra (Q| whose representative (Q | €’) is ¢(€’) we write (6(€)|.' With this 
notation the conjugate imaginary to |7(€)) is (W(€)|. Thus the rule that we 
have used hitherto, that a ket and its conjugate imaginary bra are both specified 
by the same label, must be extended to read—if the labels of a ket involve complex 
numbers or complex functions, the labels of the conjugate imaginary bra involve 
the conjugate complex numbers or functions. As in the case of kets we can 
show that (6(€)| f() and (@(€)f(€)| are the same, so that the vertical line can 
be omitted. We can consider (¢(€) as the product of the linear operator $(£) 
into the standard bra (, which is the conjugate imaginary of the standard ket ). 
We may leave the standard bra understood, so that a general bra is written as 
0(&), the conjugate complex of a wave function. The conjugate complex of a wave 
function can be multiplied by any linear operator on the right, but cannot be 
multiplied by a linear operator on the left. We can construct triple products of 
the form (f(€)). Such a triple product is a number, equal to f(&) summed or 
integrated over the whole domain of eigenvalues for the €’s, 


(F(f)) = ro PFE) dba... dG, (64) 
ard 


in the case when &,...,& have discrete eigenvalues and &,41,...,&, have 
continuous eigenvalues. 

The standard ket and bra are defined with respect to a representation. 
If we carried through the above work with a different representation in which 
the complete set of commuting observables 7 are diagonal, or if we merely changed 
the phase factors in the representation with the €’s diagonal, we should get 
a different standard ket and bra. In a piece of work in which more than one 
standard ket or bra appears one must, of course, distinguish them by giving 
them labels. 

A further development of the notation which is of great importance for dealing 
with complicated dynamical systems will now be discussed. Suppose we have 
a dynamical system describable in terms of dynamical variables which can 
all be divided into two sets, set A and set B say, such that any member of 
set A commutes with any member of set B. A general dynamical variable 
must be expressible as a function of the A-variables and B-variables together. 
We may consider another dynamical system in which the dynamical variables are 
the A-variables only—let us call it the A-system. Similarly we may consider a third 


t[The original ‘(¢(€’)|’ is corrected in the fourth edition for, like the ket |P) and 7, the bra 
is represented by the form of the function ¢ and not the set of eigenvalues £’.] 
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dynamical system in which the dynamical variables are the B-variables only— 
the B-system. The original system can then be looked upon as a combination 
of the A-system and the B-system in accordance with the mathematical scheme 
given below. 

Let us take any ket |a) for the A-system and any ket |b) for the B-system. 
We assume that they have a product |a) |b) for which the commutative and 
distributive axioms of multiplication hold, i.e. 

|a) |b) = |b) |a), 
{1 |a1) + €2 |a2)} |) = c1 Jax) |b) + c2 |a2) |0), 
|) {e1 |b1) + C2 |b2) } = ex |@) |b1) + ce |a) |e), 
the c’s being numbers. We can give a meaning to any A-variable operating 
on the product |a) |b) by assuming that it operates only on the |a) factor and 
commutes with the |b) factor, and similarly we can give a meaning to any 
B-variable operating on this product by assuming that it operates only on the |b) 
factor and commutes with the |a) factor. (This makes every A-variable commute 
with every B-variable.) Thus any dynamical variable of the original system can 
operate on the product |a) |b), so this product can be looked upon as a ket for 
the original system, and may then be written |ab), the two labels a and b being 
sufficient to specify it. In this way we get the fundamental equations 
|a) |b) = |b) a) = Jab). (65) 

The multiplication here is of quite a different kind from any that occurs earlier 
in the theory. The ket vectors |a) and |b) are in two different vector spaces and 
their product is in a third vector space, which may be called the product of the two 
previous vector spaces. The number of dimensions of the product space is equal 
to the product of the number of dimensions of each of the factor spaces. A general 
ket vector of the product space is not of the form (65), but is a sum or integral of 
kets of this form. 

Let us take a representation for the A-system in which a complete set of 
commuting observables €4 of the A-system are diagonal. We shall then have 
the basic bras (€',| for the A-system. Similarly, taking a representation for 
the B-system with the observables €, diagonal, we shall have the basic bras (€‘| 
for the B-system. The products (Ea) (ey = (Ce (66) 


will then provide the basic bras for a representation for the original system, 
in which representation the €,’s and the €,’s will be diagonal. The €,’s and €g's 
will together form a complete set of commuting observables for the original system. 


From (65) and (66) we get (et, 1a) (€4 |) = (Ee lab), (67) 


showing that the representative of |ab) equals the product of the representatives 
of |a) and of |b) in their respective representations. 
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We can introduce the standard ket, ), say, for the A-system, with respect 
to the representation with the €,’s diagonal, and also the standard ket ), 
for the B-system, with respect to the representation with the €,’s diagonal. 
Their product ), ), is then the standard ket for the original system, with respect 
to the representation with the €,’s and €,’s diagonal. Any ket for the original 
system may be expressed as (Ese) 4 ep: (68) 

It may be that in a certain calculation we wish to use a particular representation 

for the B-system, say the above representation with the €p’s diagonal, but do not 
wish to introduce any particular representation for the A-system. It would then 
be convenient to use the standard ket ),, for the B-system and no standard ket for 
the A-system. Under these circumstances we could write any ket for the original 
system as [omy es (69) 
in which |€g) is a ket for the A-system and is also a function of the p's, i.e. it is 
a ket for the A-system for each set of values for the €,’s—in fact (69) equals (68) 
if we take Es) = v(E4EB))a- 
We may leave the standard ket ), in (69) understood, and then we have the general 
ket for the original system appearing as |€s), a ket for the A-system and a wave 
function in the variables €g of the B-system. An example of this notation will be 
used in §866 and 79. 

The above work can be immediately extended to a dynamical system 
describable in terms of dynamical variables which can be divided into three or 


more sets A, B, C,... such that any member of one set commutes with any 
member of another. Equation (65) gets generalized to 
Ja) |b) |e) --- = Jabe...), 


the factors on the left being kets for the component systems and the ket on the right 
being a ket for the original system. Equations (66), (67) and (68) get generalized 
to many factors in a similar way. 


quantum condition 


commutation 
relations 
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21. Poisson brackets 

OuR work so far has consisted in setting up a general mathematical scheme 
connecting states and observables in quantum mechanics. One of the dominant 
features of this scheme is that observables, and dynamical variables in general, 
appear in it as quantities which do not obey the commutative law of multiplication. 
It now becomes necessary for us to obtain equations to replace the commutative 
law of multiplication, equations that will tell us the value of £7 — n€ when € and 
7” are any two observables or dynamical variables. Only when such equations 
are known shall we have a complete scheme of mechanics with which to replace 
classical mechanics. These new equations are called quantum conditions or 
commutation relations. 

The problem of finding quantum conditions is not of such a general character 
as those we have been concerned with up to the present. It is instead a special 
problem which presents itself with each particular dynamical system one is 
called upon to study. There is, however, a fairly general method of obtaining 
quantum conditions, applicable to a very large class of dynamical systems. 
This is the method of classical analogy and will form the main theme of 
the present chapter. Those dynamical systems to which this method is not 
applicable must be treated individually and special considerations used in 
each case. 

The value of classical analogy in the development of quantum mechanics 
depends on the fact that classical mechanics provides a valid description of 
dynamical systems under certain conditions, when the particles and bodies 
composing the systems are sufficiently massive for the disturbance accompanying 
an observation to be negligible. Classical mechanics must therefore be a limiting 
case of quantum mechanics. We should thus expect to find that important concepts 
in classical mechanics correspond to important concepts in quantum mechanics, 
and, from an understanding of the general nature of the analogy between classical 
and quantum mechanics, we may hope to get laws and theorems in quantum 
mechanics appearing as simple generalizations of well-known results in classical 
mechanics; in particular we may hope to get the quantum conditions appearing as 
a simple generalization of the classical law that all dynamical variables commute. 
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Let us take a dynamical system composed of a number of particles in 
interaction. As independent dynamical variables for dealing with the system 
we may use the Cartesian coordinates of all the particles and the corresponding 
Cartesian components of velocity of the particles. It is, however, more convenient 
to work with the momentum components instead of the velocity components. 
Let us call the coordinates q,, r going from 1 to three times the number of particles, 
and the corresponding momentum components p,. The q’s and p’s are called 
canonical coordinates and momenta. 

The method of Lagrange’s equations of motion involves introducing 
coordinates q, and momenta p, in a more general way, applicable also for 
a system not composed of particles (e.g. a system containing rigid bodies). 
These more general q’s and p’s are also called canonical coordinates and momenta. 
Any dynamical variable is expressible in terms of a set of canonical coordinates 
and momenta. 

An important concept in general dynamical theory is the Poisson Bracket. 
Any two dynamical variables u and v have a P.B. (Poisson Bracket) which we shall 

Ou Ov Ou Ov 
denote by [u,v], defined by [u,v] = d. ‘Fe Ba OpsOa. \ (1) 
u and v being regarded as functions of a set of canonical coordinates and momenta 
q, and p, for the purpose of the differentiations. The right-hand side of (1) is 
independent of which set of canonical coordinates and momenta are used, this being 
a consequence of the general definition of canonical coordinates and momenta, 
so the P.B. [u,v] is well defined. 

The main properties of P.B.s, which follow at once from their definition (1), are 


[u, v] = 10; ul (2) 

lu, c| = 0, (3) 

where c is a number (which may be considered as a special case of a dynamical 

variable), [ur + U2, v] = [tr, v] + (ue, v], (4) 
[u, v1 + v2] = [u, vi] + [u, ve], 


ee re? un ey Oe Ov un aoe Ov 

i —\\dq °° "dq, ) Op, \ Op, > Op, } Oar 
= [u1, vue + ui[Ue, v], (5) 

[u, vive] = [u, viJv2 + vi[u, Vo]. 

Also the identity: [u, [v, w]] + [v, [w, ul] + [w, [u, v]] = 0 (6) 
is easily verified. Equations (4) express that the P.B. [u, uv] involves u and v linearly, 
while equations (5) correspond to the ordinary rules for differentiating a product. 
Let us try to introduce a quantum P.B. which shall be the analogue of 
the classical one. We assume the quantum P.B. to satisfy all the conditions (2) 
to (6), it being now necessary that the order of the factors u; and wg in the first of 


canonical 
coordinates 
momenta 


Poisson bracket 


P.B. 


and 


h, h 
Planck’s constant 


(2 IV. THE QUANTUM CONDITIONS 


equations (5) should be preserved throughout the equation, as in the way we have 
here written it, and similarly for the v,; and v2 in the second of equations (5). 
These conditions are already sufficient to determine the form of the quantum 
P.B. uniquely, as may be seen from the following argument. We can evaluate 
the P.B. [uju2,v1v2] in two different ways, since we can use either of the two 
formulae (5) first, 

[urUe, V1V2] = (Ur, Vive|u2 + Uy[Ue2, V1V9] 


| 
on 


U1, U1]V2 + V1 [U1, V2] }us + Ur{[u2, Vi]ve + Vi[U2, vel} 


1,U ]vzus TU [u, V2]U2 cr uy[ua, vy] v2 an U1} (U2, v2] 


U 1 
and [uzUg, V1V2] = [U1 U2, Vi]v2 + Vi [U1 U2, V9] 


= [u1, ViJugve + uy[ue2, vilve + vir, velu2 + vius[ua, Vo]. 
Equating these two results, we obtain 

[u, V1 |(Ugv2 = U2QU2) = (u1V4 = v1 U1) (ta, v9]. 
Since this condition holds with wu, and v, quite independent of uz and va, 
we must have UV, — Vi, = thi, vi], 

U2VU2 — V2U2 = ih|us, v2], 

where A must not depend on wu, and v1, nor on uz and v2, and also must commute 
with (u,v; —v U1). It follows that h must be simply a number. We want the P.B. of 
two real variables to be real, as in the classical theory, which requires, from 
the work on p. 24*, that A shall be a real number when introduced, as here, 
with the coefficient 7. We are thus led to the following definition for the quantum 
P.B. [u,v] of any two variables u and v, uv — vu = thlu, v], (7) 
in which A is a new universal constant. It has the dimensions of action. In order 
that the theory may agree with experiment, we must take h equal to h/2z, 
where h is the universal constant that was introduced by Planck, known as 
Planck’s constant. It is easily verified that the quantum P.B. satisfies all 
the conditions (2), (3), (4), (5) and (6). 

The problem of finding quantum conditions now reduces to the problem 
of determining P.B.s in quantum mechanics. The strong analogy between 
the quantum P.B. defined by (7) and the classical P.B. defined by (1) leads us 
to make the assumption that the quantum P.B.s, or at any rate the simpler ones 
of them, have the same values as the corresponding classical P.B.s. The simplest 
P.B.s are those involving the canonical coordinates and momenta themselves and 
have the following values in the classical theory: 


[ars qs| = 0, [Pr, Ds| = 0, 
[dr Ps] = Ors: 8) 


*labout the commutability of operators where € and 7 give a real number when combined as 


i(€n — n€)| 
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We therefore assume that the corresponding quantum P.B.s also have the values 
given by (8). By eliminating the quantum P.B.s with the help of (7), we obtain 


the equations Ords — 1sr =9, DPrDs — DsPr = 0, (9) 


GrPs — Psdr = iNOrs, 
which are the fundamental quantum conditions. ‘They show us where the lack of 
commutability among the canonical coordinates and momenta lies. They also 
provide us with a basis for calculating commutation relations between other 
dynamical variables. For instance, if € and 7 are any two functions of the q’s 
and p’s expressible as power series, we may express €7 — n€ or [€,7], by repeated 
applications of the laws (2), (3), (4) and (5), in terms of the elementary P.B.s 
given in (8) and so evaluate it. The result is often, in simple cases, the same 
as the classical result, or departs from the classical result only through requiring 
a special order for factors in a product, this order being, of course, unimportant in 
the classical theory. Even when € and 7 are more general functions of the q’s and 
p’s not expressible as power series, equations (9) are still sufficient to fix the value 
of €n — n&, as will become clear from the following work. Equations (9) thus 
give the solution of the problem of finding the quantum conditions, for all those 
dynamical systems which have a classical analogue and which are describable in 
terms of canonical coordinates and momenta. This does not include all possible 
systems in quantum mechanics. 

Equations (7) and (9) provide the foundation for the analogy between quantum 
mechanics and classical mechanics. They show that classical mechanics may 
be regarded as the limiting case of quantum mechanics when h tends to zero. 
A P.B. in quantum mechanics is a purely algebraic notion and is thus a rather 
more fundamental concept than a classical P.B., which can be defined only with 
reference to a set of canonical coordinates and momenta. For this reason canonical 
coordinates and momenta are of less importance in quantum mechanics than in 
classical mechanics; in fact, we may have a system in quantum mechanics for which 
canonical coordinates and momenta do not exist and we can still give a meaning 
to P.B.s. Such a system would be one without a classical analogue and we should 
not be able to obtain its quantum conditions by the method here described. 

From equations (9) we see that two variables with different suffixes r and s 
always commute. It follows that any function of g, and p, will commute with any 
function of g, and p, when s differs from r. Different values of r correspond to 
different degrees of freedom of the dynamical system, so we get the result that 
dynamical variables referring to different degrees of freedom commute. This law, 
as we have derived it from (9), is proved only for dynamical systems with classical 
analogues, but we assume it to hold generally. In this way we can make a start 
on the problem of finding quantum conditions for dynamical systems for which 
canonical coordinates and momenta do not exist, provided we can give a meaning 
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to different degrees of freedom, as we may be able to do with the help of 
physical insight. 

We can now see the physical meaning of the division, which was discussed in 
the preceding section, of the dynamical variables into sets, any member of one set 
commuting with any member of another. Each set corresponds to certain degrees 
of freedom, or possibly just one degree of freedom. The division may correspond to 
the physical process of resolving the dynamical system into its constituent parts, 
each constituent being capable of existing by itself as a physical system, 
and the various constituents having to be brought into interaction with one 
another to produce the original system. Alternatively the division may be 
merely a mathematical procedure of resolving the dynamical system into degrees 
of freedom which cannot be separated physically, e.g. the system consisting 
of a particle with internal structure may be divided into the degrees of 
freedom describing the motion of the centre of the particle and those describing 
the internal structure. 


22. Schrodinger’s representation 

Let us consider a dynamical system with n degrees of freedom having a classical 
analogue, and thus describable in terms of canonical coordinates and momenta 
dr & py (r = 1,2,...,n). We assume that the coordinates q, are all observables and 
have continuous ranges of eigenvalues, these assumptions being reasonable from 
the physical significance of the q’s. Let us set up a representation with the q’s 
diagonal. The question arises whether the q’s form a complete commuting set 
for this dynamical system. It seems pretty obvious from inspection that they do. 
We shall here assume that they do, and the assumption will be justified later 
(see p. 76). With the q’s forming a complete commuting set, the representation 
is fixed except for the arbitrary phase factors in it. 

Let us consider first the case of n = 1, so that there is only one q and p, 
satisfying gp — pq = th. (10) 
Any ket may be written in the standard ket notation ~(q)). From it we can form 
another ket dz/dq), whose representative is the derivative of the original one. 
This new ket is a linear function of the original one and is thus the result of some 
linear operator applied to the original one. Calling this linear operator d/dq, 


we have d du 
mae eee 11 
Boa F) (11) 
Equation (11) holding for all functions ~ defines the linear operator d/dq. We have 
d 
—)=0. 12 
=) (12) 


*!The possibility that follows from equation (28).| 
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Let us treat the linear operator d/dq according to the general theory of linear 
operators of §7. We should then be able to apply it to a bra (¢(q), the product 
(¢ d/dq being defined, according to (3) of §7, by 


d d 
Be = Ess 13 
orb =6{ 2 a (13) 
for all functions 7(q). Taking representatives, we get 
d / 
oF) ado) = foayag @. (14) 
We can transform the right-hand side by partial woe and get 
d / / / do(q') / / 
— d =— d 15 
Jogi atom =- [SP aro, (15) 
provided the contributions from the limits of integration vanish. This gives 
d,, __ dg(7’) 
(@ dq q ) —_ dq! a 
showing that a —— do 16 


Thus d/dq operating to the left on the conjugate complex of a wave function has 
the meaning of minus differentiation with respect to q. 

The validity of this result depends on our being able to make the passage 
from (14) to (15), which requires that we must restrict ourselves to bras 
and kets corresponding to wave functions that satisfy suitable boundary 
conditions. The conditions usually holding in practice are that they vanish 
at the boundaries. (Somewhat more general conditions will be given in 
the next section.) These conditions do not limit the physical applicability of 
the theory, but, on the contrary, are usually required also on physical grounds. 
For example, if q is a Cartesian coordinate of a particle, its eigenvalues run from 
—oo to oo, and the physical requirement that the particle has zero probability 
of being at infinity leads to the condition that the wave function vanishes for* 
q — =E0o. 

The conjugate complex the linear operator d/dq can be evaluated by noting 
that the conjugate imaginary of (d/dq) w) or dy/dq) is (dp/dq, or — (d/dq 
from (16). Thus the conjugate complex of d/dq is —d/dq, so d/dq is’ an imaginary 
linear operator. 

To get the representative of d/dq we note that, from an application of formula 


(63) of §20, la") = 6(¢-¢@")), (17) 
so that . lq") = - d(q—4")), (18) 


#[‘—” replaces ‘=’] 
S[‘pure’ is omitted] 
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and hence ‘l= 7 li )= ald =a): (19) 


The representative of d/dq oe the derivative of the 6 function. 
Let us work out the ee relation connecting d/dq with gq. We have 


dqw ge 
a Sg . 20 
as aie = Er v) + v) (20) 
. . d d 
Since this holds for any iS 3. we have a! _ 1", = (21) 


Comparing this result with (10), we see that —ihd/dq satisfies the same 
commutation relation with q that p does. 

To extend the foregoing work to the case of arbitrary n, we write the general ket 
as W(q1---Gn)) = W) and introduce the n linear operators 0/0q,, (r = 1,...,7), 
which can operate on it in accordance with the formula 


0 OW 
= 22 
w= Fe). (22) 
corresponding to (11). We have 9 ih (23) 
0dr 
corresponding to (12). Provided we restrict ourselves to bras and_ kets 


corresponding to wave functions satisfying suitable boundary conditions, 

these linear operators can operate also on bras, in accordance with the formula 
ee =-( 5, (24) 

Or dr 

corresponding to (16). Thus 0/0q, can operate to the left on the conjugate complex 

of a wave function, when it has the meaning of minus partial differentiation with 

respect to q,. We find as before that each 0/0q, is* an imaginary linear operator. 

Corresponding to (21) we have the commutation relations 


O O 
a. 4s — YsQ_ = Ors: 25 
aq MBq, 6 (25) 
a 8 Op a 0 
We have furthe = = 26 
ee Oqr O4s ¥) 5q.0q, ' Ogqs Or ¥), (26) 
0 0 0 0 


showing that 


= : 27 
Odr Ods = Os OGr vy 


Comparing (25) and (27) with (9), we see that the linear operators —ihO/Oq,, 
satisfy the same commutation relations with the q’s and with each other that 
the p’s do. 

It would be possible to take =p, = —ithO/0q, (28) 
without getting any inconsistency. This possibility enables us to see that the q’s 
must form a complete commuting set of observables, since it means that any 


*l‘pure’ is omitted.| 
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function of the q’s and p’s could be taken to be a function of the q’s and —ihO/0q’s 
and then could not commute with all the q’s unless it is a function of the q’s only. 

The equations (28) do not necessarily hold. But in any case the quantities 
pr + thO/Oq, each commute with all the q’s, so each of them is a function of 
the q’s, from Theorem 2 of §19. Thus p, = —ihO/0q, + f,(q). (29) 
Since p, and —ih0O/0q, are both real, f,(q) must be real. For any function f of 


the q’s we have a) O Of 
FN ad nae i id eR 
0 0 Of 
showing that = — 
: aq! 154, OGr 6o) 
With the help of (29) we can now deduce the general formula 
perf _ f Pr = —ihOf /Oqr- (31) 
This formula may be written in P.B. notation 
[f, Pr] = OF /Oqr, (32 


) 
when it is the same as in the classical theory, as follows from (1). Multiplying (27) 
by (—ih)? and substituting for —ih 0/0q, and —ihO/0q, their values given by (29), 
we get (pr — fr)(Ps — fs) = (Ds — fs) (Dr — fr), 
which reduces, with the help of the quantum condition p,p, = psp,, to 
Drfs ae Feds i Defer + fsPr- 
This reduces further, with the help of (31), to 


Ofs/Odr = Of -/O4s (33) 
showing that the functions f, are all of the form 
je = OF (0G: (34) 
with F' independent of r. Equation (29) now becomes 
Pr = —thO/0q, + OF /0qr- (35) 


We have been working with a representation which is fixed to the extent 
that the q’s must be diagonal in it, but which contains arbitrary phase factors. 
If the phase factors are changed, the operators 0/0q, get changed. It will now be 
shown that, by a suitable change in the phase factors, the function F’ in (35) can 
be made to vanish, so that equations (28) are made to hold. 

Using stars to distinguish quantities referring to the new representation with 
the new phase factors, we shall have the new basic bras connected with the previous 
ones by? (G50, (Se Gi eat (36) 
where 7/ = 7(q’) is a real function of the q’’s. The new representative of a ket is 
e'” times the old one, showing that e’77)* = w), so we get 


mer) (37) 


t[Suffix 1 included in the right-hand side range of q’ of the equation.] 


Schrédinger’s 
representation 
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as the connexion between the new standard ket and the original one. The new 
linear operator (0/0q,)* satisfies, corresponding to (22), 


(=) Ue og Spe 


with the help of (37). Using (22), this gives 
av ., O 24 
Pah * iy ph ANY gi N* 
Fe) We wey) 


howing that a He Se, 38 

showing tha ( x) oe: (38) 
Oo. Or. sO 

ith the help of = : 39 

or, with the help of (30), (=) Da, ar (39) 

By choosing y so that F =hy+a constant, (40) 

(35) becomes Pr = —th(0/Oq,)*. (41) 


Equation (40) fixes y except for an arbitrary constant, so the representation is 
fixed except for an arbitrary constant phase factor. 

In this way we see that a representation can be set up in which the q’s 
are diagonal and equations (28) hold. This representation is a very useful one 
for many problems. It will be called Schrédinger’s representation, as it was 
the representation in terms of which Erwin Schrédinger gave his original 
formulation of quantum mechanics in 1926. Schrédinger’s representation exists 
whenever one has canonical q’s and p’s, and is completely determined by these q’s 
and p’s except for an arbitrary constant phase factor. It owes its great convenience 
to its allowing one to express immediately any algebraic function of the q’s and 
p’s of the form of a power series in the p’s as an operator of differentiation, 
e.g. if f(q@1,---5Qn)P1,--+;Pn) is such a function, we have 

F (G1, - +1 Qn» P1,-++)Pn) = f(M,-+-, On, -thO/OM, ..., -ihO/Oqn), (42) 
provided we preserve the order of the factors in a product on substituting 
the —ihO/0q’s for the p’s. 

From (23) and (28), we have: = pr) = 0. (43) 
Thus the standard ket in Schrédinger’s representation is characterized by 
the condition that it is a simultaneous eigenket of all the momenta belonging 
to the eigenvalues zero. Some properties of the basic vectors of Schrédinger’s 
representation may also be noted. Equation (22) gives 


fw cep SOP rae’ ogg ut OUR” OO SasGa)? Oe £5 
Mui a) ae Dah = Hq, Si LY). 
O O 
a A ee | ee, 44 
Hence (nie G,,| an Oe (Gauss | (44) 
/ / . 0 / / 
so that (OG est te = =e Ld vcade | (45) 


Similarly, equation (24) leads to J 
Pr |, +++ Un) = es 1+ ++ Gn): (46) 
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23. The momentum representation 

Let us take a system with one degree of freedom, describable in terms of a q and 
p with the eigenvalues of g running from —oo to oo, and let us take an eigenket 
|p’) of p. Its representative in the Schrédinger representation, (q' | p’), satisfies 


/ / / / / . d / / 
pq |p)=(d|plp) = —th— (|p), 


dq 
with the help of (45) applied to the case of one degree of freedom. The solution 
of this differential equation for (q | p’) is (q |p) = cei? o/h (47) 


where c’ = c(p’) is independent of q’, but may involve p’. 

The representative (q' |p’) does not satisfy the boundary conditions of 
vanishing as' gq’ — +too. This gives rise to some difficulty, which shows itself 
up most directly in the failure of the orthogonality theorem. If we take a second 
eigenket |p") of p with representative (q’ | p’) = c’e""/" belonging to a different 
eigenvalue p”, we shall have 

(p' |p") = / (po! |) dq’ (q' |p") =ae" : ee eg. (48) 
This integral does not converge according to the usual definition of convergence. 
To bring the theory into order, we adopt a new definition of convergence of 
an integral whose domain extends to infinity, analogous to the Cesaro definition 
of the sum of an infinite series. With this new definition, an integral whose 
value to the upper limit q’ is of the form cosaq’ or sinagq’, with a a real number 
not zero, is counted as zero when gq’ tends to infinity, i.e. we take the mean 
value of the oscillations, and similarly for the lower limit of q’ tending to minus 
infinity. This makes the right-hand side of (48) vanish for p” # p’, so that 
the orthogonality theorem is restored. Also it makes the right-hand sides of (13) 
and (14) equal when (¢ and w) are eigenvectors of p, so that eigenvectors of p 
become permissible vectors to use with the operator d/dq. Thus the boundary 
conditions that the representative of a permissible bra or ket has to satisfy become 
extended to allow the representative to oscillate like cos aq’ or sin aq’ as q’ goes to 
infinity or minus infinity. 

For p” very close to p’, the right-hand side of (48) involves a 6 function. 


(oe) 


To evaluate it, we need the formula / eit don = 275 (a) (49) 


oe) 


for real a, which may be proved as follows. The formula evidently holds for a 
different from zero, as both sides are then zero. Further we have, for any continuous 


function f(a), [fo dé few dz = Ric da 2a‘ sin ag = 27 f(0) 
266 ~g —oo 


in the limit when g tends to infinity. A more complicated argument shows that 
we get the same result if instead of the limits g and —g we put g; and —go, and then 


t[‘—” replaces ‘=’] 
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let g; and gp tend to infinity in different ways (not too widely different). This shows 
the equivalence of both sides of (49) as factors in an integrand, which proves 
the formula. 
With the help of (49), (48) becomes 
(p" | p") = ee"2nd[(p' — p")/h] = ec"hd(p' — p") 
= |c'|7hd(p' — p"). (50) 
We have obtained an eigenket of p belonging to any real eigenvalue p’, 
its representative being given by (47). Any ket |X) can be expanded in terms 
of these eigenkets of p, since its representative (q’ | X) can be expanded in terms 
of the representatives (47) by Fourier analysis. It follows that the momentum p 
is an observable, in agreement with the experimental result that momenta can 


be observed. 
A symmetry now appears between g and p. Each of them is an observable with 


eigenvalues extending from —oo to oo, and the commutation relation connecting 
q and p, equation (10), remains invariant if we interchange q and p and write —7 


for i. We have set up a representation in which q is diagonal and p = —ih d/dq. 
It follows from the symmetry that we can also set up a representation in which p 
is diagonal and q=ithd/dp, (51) 


the operator d/dp being defined by a procedure similar to that used for d/dq. 
This representation will be called the momentum representation. It is less useful 
than the previous Schréddinger representation because, while the Schrédinger 
representation enables one to express as an operator of differentiation any function 
of q and p that is a power series in p, the momentum representation enables one 
so to express any function of g and p that is a power series in g, and the important 
quantities in dynamics are almost always power series in p but are often not power 
series in g. All the same the momentum representation is of value for certain 
problems (see §50). 

Let us calculate the transformation function (q' | p’) connecting the two 
representations. The basic kets |p’) of the momentum representation are eigenkets 
of p and their Schrédinger representatives (q' |p’) are given by (47) with 
the coefficients c’ suitably chosen. The phase factors of these basic kets must 
be chosen so as to make (51) hold. The easiest way to bring in this condition 
is to use the symmetry between q and p referred to above, according to which 
(q' | p’) must go over into (p’ | q’) if we interchange q’ and p’ and write —1i for i. 
Now (q' | p’) is equal to the right-hand side of (47) and (p’ | q') to the conjugate 
complex expression, and hence c’ must be independent of p. Thus c’ is just 
a number c. Further, we must have (p’ | p”) = 6(p' — p"), 
which shows, on comparison with (50), that |c|=h-% |= We can choose 
the arbitrary constant phase factor in either representation so as to make c = h73, 
and we then get (q |p) =h de? e/* (52) 
for the transformation function. 


81 


The foregoing work may easily be generalized to a system with n degrees of 
freedom, describable in terms of n q’s and p’s, with the eigenvalues of each q 
running from —oo to oo. Each p will then be an observable with eigenvalues 
running from —oo to oo, and there will be symmetry between the set of q’s and 
the set of p’s, the commutation relations remaining invariant if we interchange 
each q, with the corresponding p, and write —7 for 7. A momentum representation 
can be set up in which the p’s are diagonal and each 

dr = thO/Op,. (53) 
The transformation function connecting it with the Schrédinger representation will 
be given by the product of the transformation functions for each degree of freedom 
separately, as is shown by formula (67) of §20, and will thus be 


(q19)-- Gp | Pido-» + Ph) = (Gy | Di) (aa | Bo) ~~ (ah, | Dh) 
— AP 2 ett tPoBt + PrIn)/h (54) 


24. Heisenberg’s principle of uncertainty 
For a system with one degree of freedom, the Schrddinger and the momentum 
representatives of a ket |X) are connected by 


(|X) =n / ei add (d | X), 
20 (55) 
(|X) =n / eit!» ap! (pl |X). 


These formulae have an elementary significance. They show that either of 
the representatives is given, apart from numerical coefficients, by the amplitudes 
of the Fourier components of the other. 

It is interesting to apply (55) to a ket whose Schrédinger representative consists 
of what is called a wave packet. This is a function whose value is very small 
everywhere outside a certain domain, of width Aq’ say, and inside this domain is 
approximately periodic with a definite frequency! If a Fourier analysis is made of 
such a wave packet, the amplitude of all the Fourier components will be small, 
except those in the neighbourhood of the definite frequency. The components 
whose amplitudes are not small will fill up a frequency? band whose width is of 
the order 1/Aq‘, since two components whose frequencies differ by this amount, 
if in phase in the middle of the domain Aq’, will be just out of phase and interfering 
at the ends of this domain. Now in the first of equations (55) the variable 
(27)~1p'/h = p'/h plays the part of frequency. Thus with (q’ | X) of the form 
of a wave packet, the function (p’ |X), being composed of the amplitudes of 
the Fourier components of the wave packet, will be small everywhere in the p/-space 
outside a certain domain of width Ap’ = h/Aq‘. 


t Frequency here means reciprocal of wave-length. 
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Let us now apply the physical interpretation of the square of the modulus of 
the representative of a ket as a probability. We find that our wave packet represents 
a state for which a measurement of q is almost certain to lead to a result lying 
in a domain of width Aq’ and a measurement of p is almost certain to lead to 
a result lying in a domain of width Ap’ We may say that for this state q has 
a definite value with an error of order Aq’ and p has a definite value with an error 
of order Ap’. The product of these two errors is 

Ag Ap = h, (56) 
Thus the more accurately one of the variables q or p has a definite value, the less 
accurately the other has a definite value. For a system with several degrees 
of freedom, equation (56) applies to each degree of freedom separately. 

Equation (56) is known as Heisenberg’s Principle of Uncertainty. 
It shows clearly the limitations in the possibility of simultaneously assigning 
numerical values, for any particular state, to two noncommuting observables, 
when those observables are a canonical coordinate and momentum, and provides 
a plain illustration of how observations in quantum mechanics may be 
incompatible. It also shows how classical mechanics, which assumes that 
numerical values can be assigned simultaneously to all observables, may be 
a valid approximation when h can be considered as small enough to be negligible. 
Equation (56) holds only in the most favourable case, which occurs when 
the representative of the state is of the form of a wave packet. Other forms of 
representative would lead to a Aq’ and Ap’ whose product is larger than h. 

Heisenberg’s principle of uncertainty shows that, in the limit when either q or 
p is completely determined, the other is completely undetermined. This result can 
also be obtained directly from the transformation function (q’ | p’). According to 
the end of 818, |(q' | p’)|’dq’ is proportional to the probability of q having a value 
in the small range from q’ to q’ + dq’ for the state for which p certainly has 
the value p’, and from (52) this probability is independent of q’ for a given dq. 
Thus if p certainly has a definite value p’, all values of g are equally probable. 
Similarly, if g certainly has a definite value q’, all values of p are equally probable. 

It is evident physically that a state for which all values of qg are equally 
probable, or one for which all values of p are equally probable, cannot be attained 
in practice, in the first case because of limitations of size and in the second because 
of limitations of energy. Thus an eigenstate of p or an eigenstate of gq cannot be 
attained in practice. The argument at the end of $12 already showed that such 
eigenstates are unattainable, because of the infinite precision that would be needed 
to set them up, and we now have another argument leading to the same conclusion. 


25. Displacement operators 
We get a new insight into the meaning of some of the quantum conditions by 
making a study of displacement operators. These appear in the theory when 
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we take into consideration that the scheme of relations between states and 
dynamical variables given in Chapter II is essentially a physical scheme, so that 
if certain states and dynamical variables are connected by some relation, on our 
displacing them all in a definite way (for example, displacing them all through 
a distance dx in the direction of the z-axis of Cartesian coordinates), the new 
states and dynamical variables would have to be connected by the same relation. 

The displacement of a state or observable is a perfectly definite process 
physically. Thus to displace a state or observable through a distance 6x in 
the direction of the x-axis, we should merely have to displace all the apparatus used 
in preparing the state, or all the apparatus required to measure the observable, 
through the distance dx in the direction of the x-axis, and the displaced 
apparatus would define the displaced state or observable. The displacement of 
a dynamical variable must be just as definite as the displacement of an observable, 
because of the close mathematical connexion between dynamical variables and 
observables. A displaced state or dynamical variable is uniquely determined 
by the undisplaced state or dynamical variable together with the direction and 
magnitude of the displacement. 

The displacement of a ket vector is not such a definite thing though. If we take 
a certain ket vector, it will represent a certain state and we may displace this state 
and get a perfectly definite new state, but this new state will not determine 
our displaced ket, but only the direction of our displaced ket. We help to fix our 
displaced ket by requiring that it shall have the same length as the undisplaced ket, 
but even then it is not completely determined, but can still be multiplied by 
an arbitrary phase factor. One would think at first sight that each ket one 
displaces would have a different arbitrary phase factor, but with the help of 
the following argument, we see that it must be the same for them all. We make 
use of the law that superposition relationships between states remain invariant 
under the displacement. A superposition relationship between states is expressed 
mathematically by a linear equation between the kets corresponding to those 
states, for example |R) =c,|A) + c2|B), (57) 
where c; and cp are numbers, and the invariance of the superposition relationship 
requires that the displaced states correspond to kets with the same linear equation 
between them—in our example they would correspond to |Rd), |Ad), |Bd) say, 
satisfying |Rd) = c, |Ad) + c2|Bd). (58) 
We take these kets to be our displaced kets, rather than these kets multiplied 
by arbitrary independent phase factors, which latter kets would satisfy a linear 
equation with different coefficients cy & co. The only arbitrariness now left in 
the displaced kets is that of a single arbitrary phase factor to be multiplied into 
all of them. 

The condition that linear equations between the kets remain invariant 
under the displacement and that an equation such as (58) holds whenever 
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the corresponding (57) holds, means that the displaced kets are linear functions of 
the undisplaced kets and thus each displaced ket |Pd) is the result of some linear 
operator applied to the corresponding undisplaced ket |P). In symbols, 

|Pd) = D|P), (59) 
where D is a linear operator independent of |P) and depending only on 
the displacement. The arbitrary phase factor by which all the displaced kets 
may be multiplied results in D being undetermined to the extent of an arbitrary 
numerical factor of modulus unity. 

With the displacement of kets made definite in the above manner and 
the displacement of bras, of course, made equally definite, through their being 
the conjugate imaginaries of the kets, we can now assert that any symbolic 
equation between kets, bras and dynamical variables must remain invariant under 
the displacement of every symbol occurring in it, on account of such an equation 
having some physical significance which will not get changed by the displacement. 

Take as an example the equation (| Ph Se 


c being a number. Then we must have (Qd| Pd) =c=(Q | P). (60) 
From the conjugate imaginary of (59) with Q instead of P, 

(Qd| = (Q| D. (61) 
Hence (60) gives (Q| DD|P) =(Q | P). 
Since this holds for arbitrary (Q| and |P), we must have DD =1, (62) 
giving us a general condition which D has to satisfy. 

Take as a second example the equation ep Pye | Ry. 

where v is any dynamical variable. Then, using vg to denote the displaced 
dynamical variable, we must have va |Pd) = |Rd). 
With the help of (59) we get — va|Pd) = D|R) = Dv|P) = DvD™' |Pd). 
Since |Pd) can be any ket, we must have vq = DvD“, (63) 


which shows that the linear operator D determines the displacement of dynamical 
variables as well as that of kets and bras. Note that the arbitrary numerical factor 
of modulus unity in D does not affect vg, and also it does not affect the validity 
of (62). 

Let us now pass to an infinitesimal displacement, i.e. taking the displacement 
through the distance dx in the direction of the x-axis, let us make 6% — 0. 
From physical continuity we should expect a displaced ket |Pd) to tend to 
the original |P) and we may further expect the limit 


Pd) —|P D-1 
jim Sh WP? = tim P=? [py 
5x40 Ox dr30 OX 
to exist. This requires that the limit Jim (D — 1)/dx (64) 
2 


shall exist. This limit is a linear operator which we shall call the displacement 
operator for the x-direction and denote by d,. The arbitrary numerical factor 
e’? with y real which we may multiply into D must be made to tend to unity as 
dx — 0 and then introduces an arbitrariness in d,, namely, d, may be replaced by 
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jim (De? —1)/dx = Jim (D —1+iy7)/dr = d, + iag, 
where a, is the limit of y/éz. Thus d, contains an arbitrary* imaginary number. 
For 6x small D=1+62d,. (65) 
Substituting this into (62), we get (1+ 6xd,)(1 + 6% dy) = 1, 
which reduces, with neglect of (dx)?, to 6x(dz +dz) = 0. 
Thus d, ist an imaginary linear operator. Substituting (65) into (63) we get, with 
neglect. of (6x7)? again, va = (1+ 62rd,)u(1—62d,) =v+6x(d,v—vdz), (66) 
showing that Jim (va —v)/6x = dv — vd. (67) 


We may describe any dynamical system in terms of the following dynamical 
variables: the Cartesian coordinates x, y, z of the centre of mass of the system, 
the components pz, py, pz of the total momentum of the system, which are 
the canonical momenta conjugate to x, y, z respectively, and any dynamical 
variables needed for describing internal degrees of freedom of the system. 
If we suppose a piece of apparatus which has been set up to measure x, to be 
displaced a distance dx in the direction of the x-axis, it will measure x — 6x, hence 

Ci = Ok: 

Comparing this with (66) for v = x, we obtain d,x — xd, = —1. (68) 
This is the quantum condition connecting d, with x. From similar arguments 
we find that y, z, Pr, Py, pz and the internal dynamical variables, which are 
unaffected by the displacement, must commute with d,. Comparing these results 
with (9), we see that ihd,, satisfies just the same quantum conditions as p,. 
Their difference, p, — ihd,, commutes with all the dynamical variables and must 
therefore be a number. This number, which is necessarily real since p, and thdy,, 
are both real, may be made zero by a suitable choice of the arbitrary’ imaginary 
number that can be added to d,. We then have the result p, = ihd,, (69) 
or the x-component of the total momentum of the system is ih times 
the displacement operator dy. 


This is a fundamental result, which gives a new significance to displacement 
operators. There is a corresponding result, of course, also for the y and z 
displacement operators d, and d,. The quantum conditions which state that 
Px; Py and p, commute with each other are now seen to be connected with the fact 
that displacements in different directions are commutable operations. 


*l‘additive pure’ omitted. 
t[\pure’ omitted] 
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26. Unitary transformations 
Let U be any linear operator that has a reciprocal U~!' and consider the equation 
a* =UaU; (70) 
q@ being an arbitrary linear operator. This equation may be regarded as expressing 
a transformation from any linear operator a to a corresponding linear operator a‘, 
and as such it has rather remarkable properties. In the first place it should be 
noted that each a* has the same eigenvalues as the corresponding a; since, if a’ is 
any eigenvalue of a and |a’) is an eigenket belonging to it, we have 
ala’) =a’ |a’) 
and hence aU |a’) = UaU'U a’) = Va |a’) = a’ |’), 
showing that U ja’) is an eigenket of a* belonging to the same eigenvalue a’, 
and similarly any eigenvalue of a* may be shown to be also an eigenvalue 
of a. Further, if we take several a’s that are connected by algebraic equations 
and transform them all according to (70), the corresponding a*’s will be 
connected by the same algebraic equations. This result follows from the fact 
that the fundamental algebraic processes of addition and multiplication are left 
invariant by the transformation (70), as is shown by the following equations: 
(a; + a2)* = U(ay + a2)U* = UU * + VagU" = at + a8, 
(a1Q2)* = VayagU~* = Ua,U"'UaU = afas. 

Let us now see what condition would be imposed on U by the requirement that 

any real a transforms into a real a* Equation (70) may be written 


a*U = Ua. (71) 
Taking the conjugate complex of both sides in accordance with (5) of §8 we find, 
if a and a* are both real, Ua = au. (72) 
Equation (71) gives us Uo'U = Ua, 
and equation (72) gives us Ua*U = avu. 
Hence UUa = duu. 


Thus UU commutes with any real linear operator and therefore also with any 
linear operator whatever, since any linear operator can be expressed as one 
real one plus i times another. Hence UU is a number. It is obviously real, 
its conjugate complex according to (5) of §8 being the same as itself, and further 
it must be a positive number, since for any ket |P), (P|UU |P) is positive as 
well as (P| P). We can suppose it to be unity without any loss of generality in 


the transformation (70). We then have UU =1. (73) 
Equation (73) is equivalent to any of the following 
U=0., U=u ul = 1. (74) 


A matrix or linear operator U that satisfies (73) and (74) is said to be unitary 
and a transformation (70) with unitary U is called a unitary transformation. 
A unitary transformation transforms real linear operators into real linear operators 
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and leaves invariant any algebraic equation between linear operators. It may be 
considered as applying also to kets and bras, in accordance with the equations 

|IP*)=U|P), (P*|=(P|U = (P|, (75) 
and then it leaves invariant any algebraic equation between linear operators, 
kets and bras. It transforms eigenvectors of a into eigenvectors of a* From this 
one can easily deduce that it transforms an observable into an observable and that 
it leaves invariant any functional relation between observables based on the general 
definition of a function given in §11. 

The inverse of a unitary transformation is also a unitary transformation, 
since from (74), if U is unitary, U~! is also unitary. Further, if two 
unitary transformations are applied in succession, the result is a third unitary 
transformation, as may be verified in the following way. Let the two unitary 
transformations be (70) and at = Va‘v- 


The connexion between at andaisthen at =VUaU~!V~! 

= (VU)a(VU)" (76) 
from (42) of §11. Now VU is unitary since {(VU}VU =UVVU = UU =1, 
and hence (76) is a unitary transformation. 

The transformation given in the preceding section from undisplaced to 
displaced quantities is an example of a unitary transformation, as is shown by 
equations (62) & (63) corresponding to equations (73) & (70) and equations (59) 
& (61) corresponding to equations (75). 

In classical mechanics one can make a transformation from the canonical 
coordinates and momenta q,, p, (r = 1,...,n) to a new set of variables q*, p* 
(r =1,...,n) satisfying the same P.B. relations as the q’s and p’s, i.e. equations (8) 
of §21 with qg*’s and p*’s replacing the q’s and p’s, and can express all dynamical 
variables in terms of the g*’s and p*’s. The q*’s and p*’s are then also called 
canonical coordinates and momenta and the transformation is called a contact 
transformation. One can easily verify that the P.B. of any two dynamical variables 
u and v is correctly given by formula (1) of §21 with q*’s and p*’s instead of q’s 
and p’s, so that the P.B. relationship is invariant under a contact transformation. 
This results in the new canonical coordinates and momenta being on the same 
footing as the original ones for many purposes of general dynamical theory, 
even though the new coordinates g> may not be a set of Lagrangian coordinates 
but may be functions of the Lagrangian coordinates and velocities. 

It will now be shown that, for a quantum dynamical system that has a classical 
analogue, unitary transformations in the quantum theory are the analogue of 
contact transformations in the classical theory. Unitary transformations are more 
general than contact transformations, since the former can be applied to systems 
in quantum mechanics that have no classical analogue, but for those systems 
in quantum mechanics which are describable in terms of canonical coordinates 


contact 
transformation 
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and momenta, the analogy between the two kinds of transformation holds. 
To establish it, we note that a unitary transformation applied to the quantum 
variables! q, & p, gives new variables g* & p* satisfying the same P.B. relations, 
since the P.B. relations are equivalent to the algebraic relations (9) of §21 and 
algebraic relations are left invariant by a unitary transformation. Conversely, 
any real variables g* & p* satisfying the P.B. relations for canonical coordinates 
and momenta are connected with the gq, & p, by a unitary transformation, as is 
shown by the following argument. 

We use the Schrédinger representation, and write the basic ket |q/, ...q/,) as |q’) 
for brevity. Since we are assuming that the g* & p* satisfy the P.B. relations for 
canonical coordinates and momenta, we can set up a Schrddinger representation 
referring to them, with the qg* diagonal and each py equal to —ihO/0q*. The basic 
kets in this second Schrédinger representation will be |qj’...q*"), which we write 
\q*") for brevity. Now introduce the linear operator U defined by 


(¢"'|U |q') = 6(¢" — 7), (77) 
where 6(q*’ — q’) is short for 
5(q” — q') = 5(ay' — %1)5(a3' — 9) ..-0(gh’ — Gh). (78) 
q)= a é(q Pe qd); 


The conjugate complex of (77) is (q/|U 


and hence* (q' | UU \a") = Kc | U qr’) dq” (q" U a”) 

= [oo _ q') dq” 3(q" font q") 

| d(q' =| nae 
so that ise — 0 
Thus U is a unitary operator. We have further (q*’| g'U ig = f'd(q" — 7) 
and (q"'| Ua, |¢) = 6(q" — @')q. 
The right-hand sides of these two equations are equal on account of the property 
of the 6 function (11) of §15, and hence gu ave 
or C= UG. 
Again, from (45) and (46), (q"'| prU |q') = itis 8a" —¢q), 


(¢"'| Up, |q) = ing i(a" —q). 
The right-hand sides of these two equations are obviously equal, and hence 
p,U = Up, 
or p* = Up,U~*. 
t[Ampersands have been included in two item lists for readability.| 


*We use the notation - a single integral sign and dgq*’ to denote an integral over all 
the variables gj’, q3',..., q%. This abbreviation will be used also in future work. 
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Thus all the conditions for a unitary transformation are verified. 

We get an infinitesimal unitary transformation by taking U in (70) to differ by 
an infinitesimal from unity. Put U =1+7eF, 
where € is infinitesimal, so that its square can be neglected. Then 

Ul=1-ieF. 
The unitary condition (73) or (74) requires that F shall be real. 
The transformation equation (70) now takes the form 
a* = (1+ieF )a(1 —ieF), 

which gives a* —a=ie(Fa-—aF). (79) 
It may be written in P.B. notation a* —a = ehja, F]. (80) 
If a is a canonical coordinate or momentum, this is formally the same as a classical 
infinitesimal contact transformation. 
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27. Schrodinger’s form for the equations of motion 
Our work from 85 onwards has all been concerned with one instant of time. 
It gave the general scheme of relations between states and dynamical variables 
for a dynamical system at one instant of time. To get a complete theory 
of dynamics we must consider also the connexion between different instants 
of time. When one makes an observation on the dynamical system, the state 
of the system gets changed in an unpredictable way, but in between observations 
causality applies, in quantum mechanics as in classical mechanics, and the system 
is governed by equations of motion which make the state at one time determine the 
state at alater time. These equations of motion we now proceed to study. They will 
apply so long as the dynamical system is left undisturbed by any observation 
or similar process* Their general form can be deduced from the principle of 
superposition of Chapter I. 

Let us consider a particular state of motion throughout the time during which 
the system is left undisturbed. We shall have the state at any time t corresponding 
to a certain ket which depends on t and which may be written |t). If we deal with 
several of these states of motion we distinguish them by giving them labels such 
as A, and we then write the ket which corresponds to the state at time ¢t for one 
of them |At). The requirement that the state at one time determines the state 
at another time means that |At ) determines |At) except for a numerical factor. 
The principle of superposition applies to these states of motion throughout the time 
during which the system is undisturbed, and means that if we take a superposition 
relation holding for certain states at time to and giving rise to a linear equation 
between the corresponding kets, e.g. the equation 

|Rto) = cy |Ato) + co |Bto), 


the same superposition relation must hold between the states of motion 
throughout the time during which the system is undisturbed and must lead to 
the same equation between the kets corresponding to these states at any time t 
(in the undisturbed time interval), i.e. the equation 


*The preparation of a state is a process of this kind. It often takes the form of making 
an observation and selecting the system when the result of the observation turns out to be 
a certain pre-assigned number. 
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|Rt) = c, |At) + cp |B), 
provided the arbitrary numerical factors by which these kets may be multiplied 
are suitably chosen. It follows that the |Pt)’s are linear functions of the |Pto)’s 
and each |Pt) is the result of some linear operator applied to |Pt)). In symbols 
PS P| Pig) (1) 
where T is a linear operator independent of P and depending only on t (and to). 
We now assume that each |Pt) has the same length as the corresponding |Pto). 
It is not necessarily possible to choose the arbitrary numerical factors by which 
the |Pt)’s may be multiplied so as to make this so without destroying the linear 
dependence of the | Pt)’s on the |Pto)’s, so the new assumption is a physical one and 
not just a question of notation. It involves a kind of enhancement? of the principle 
of superposition. The arbitrariness in |Pt) now becomes merely a phase factor, 
which must be independent of P in order that the linear dependence of the | Pt)’s on 
the |Pto)’s may be preserved. From the condition that the length of c; | Pt)+c2 |Qt) 
equals that of c; |Pto) + ce |Qto) for any complex numbers c; & cy, we can deduce 
that (Qt | Pt) = (Qto | Pto). (2) 
The connexion between the |Pt)’s and |Ptg)’s is formally similar to 
the connexion we had in §25 between the displaced and undisplaced_ kets, 
with a process of time displacement instead of the space displacement of §25. 
Equations (1) and (2) play the part of equations (59) and (60) of §25. We can 
develop the consequences of these equations as in §25 and can deduce that T’ 
contains an arbitrary numerical factor of modulus unity and satisfies 
TT =1, (3) 
corresponding to (62) of §25, so T is unitary. We pass to the infinitesimal case by 
making t + tp and assume from physical continuity that the limit 
van LPH) = |Pto) 
t—to t — to 
exists. This limit is just the derivative of |Pto) with respect to to. From (1) 


it ] T-1 
nea Eta {im \ iPr) (4) 
t>to t — to 


dto 
The limit operator occurring here is, like (64) of §25, an’ imaginary linear 
operator and is undetermined to the extent of an arbitrary! imaginary number. 
Putting this limit operator multiplied by ih equal to H, or rather H(to) since 
it may depend on to, equation (4) becomes, when written for a general t, 
d|Pt) 


ih = H(t) |Pi). (5) 


‘[‘sharpening’ replaced by ‘enhancement |] 
S[‘pure’ omitted. 
‘additive pure’ omitted] 
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Equation (5) gives the general law for the variation with time of the ket 
corresponding to the state at any time. It is Schrédinger’s form for the equations 
of motion. It involves just one real linear operator H(t), which must be 
characteristic of the dynamical system under consideration. We assume that 
H(t) is the total energy of the system. There are two justifications for this 
assumption, (i) the analogy with classical mechanics, which will be developed 
in the next section, and (ii) we have H(t) appearing as 7h times an operator of 
displacement in time similar to the operators of displacement in the x, y and z 
directions of §25, so corresponding to (69) of §25 we should have H(t) equal to 
the total energy, since the theory of relativity puts energy in the same relation to 
time as momentum to distance. 

We assume on physical grounds that the total energy of a system is always 
an observable. For an isolated system it is a constant, and may then be 
written H. Even when it is not a constant we shall often write it simply H, 
leaving its dependence on ¢t understood. If the energy depends on t, it means 
the system is acted on by external forces. An action of this kind is to be 
distinguished from a disturbance caused by a process of observation, as the former 
is compatible with causality and equations of motion while the latter is not. 

We can get a connexion between H(t) and the T of equation (1) by substituting 
for |Pt) in (5) its value given by equation (1). This gives 


dT 
ih— |Pto) = H(t)T |Pto). 
dt 
Since |Pto) may be any ket, we have ine Se (6) 


Equation (5) is very important for practical problems, where it is usually used in 
conjunction with a representation. Introducing a representation with a complete 
set of commuting observables € diagonal and putting (€’ | Pt) equal to W(é’t), 
we have, passing to the standard ket notation, |Pt) = w(Et)). 


Equation (5) now becomes ine w(ét)) = H W(Et)). (7) 


Equation (7) is known as Schrédinger’s wave equation and its solutions 2(&t) are 
time-dependent wave functions. Each solution corresponds to a state of motion of 
the system and the square of its modulus gives the probability of the €’s having 
specified values at any time t. For a system describable in terms of canonical 
coordinates and momenta we may use Schrédinger’s representation and can then 
take H to be an operator of differentiation in accordance with (42) of §22. 


28. Heisenberg’s form for the equations of motion 

In the preceding section we set up a picture of the states of undisturbed motion 
by making each of them correspond to a moving ket, the state at any time 
corresponding to the ket at that time. We shall call this the Schrédinger picture. 
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Let us apply to our kets the unitary transformation which makes each ket |a) go 
over into Fea eae disease (8) 
This transformation is of the form given by (75) of §26 with T7! for U, 
but it depends on the time t since T’ depends on ¢t. It is thus to be 
pictured as the application of a continuous motion (consisting of rotations 
and uniform deformations) to the whole ket vector space. A ket which is 
originally fixed becomes a moving one, its motion being given by (8) with |a) 
independent of t. On the other hand, a ket which is originally moving to 
correspond to a state of undisturbed motion, i.e. in accordance with equation (1), 
becomes fixed, since on substituting |Pt) for |a) in (8) we get |a*) independent 
of t. Thus the transformation brings the kets corresponding to states of undisturbed 
motion to rest. 

The unitary transformation must be applied also to bras and linear operators, 
in order that equations between the various quantities may remain invariant. 
The transformation applied to bras is given by the conjugate imaginary of (8) 
and applied to linear operators it is given by (70) of §26 with T~! for U, i.e. 

Gia Preaek. (9) 
A linear operator which is originally fixed transforms into a moving linear operator 
in general. Now a dynamical variable corresponds to a linear operator which is 
originally fixed (because it does not refer to t at all), so after the transformation 
it corresponds to a moving linear operator. The transformation thus leads us 
to a new picture of the motion, in which the states correspond to fixed vectors 
and the dynamical variables to moving linear operators. We shall call this 
the Heisenberg picture. 

The physical condition of the dynamical system at any time involves 
the relation of the dynamical variables to the state, and the change of 
the physical condition with time may be ascribed either to a change in the state, 
with the dynamical variables kept fixed, which gives us the Schrédinger picture, 
or to a change in the dynamical variables, with the state kept fixed, which gives 
us the Heisenberg picture. 

In the Heisenberg picture there are equations of motion for the dynamical 
variables. Take a dynamical variable corresponding to the fixed linear operator v 
in the Schrédinger picture. In the Heisenberg picture it corresponds to a moving 
linear operator, which we write as v; instead of v*, to bring out its dependence 


on t, and which is given by vy, = T eT (10) 
Ty, = vT. 
oi tiating with t tot ‘ 7 oF itp ee 
rentiatin ith respect to e ge Ut =U . 
eae ee E ote dit" dt dé 


With the help of (6), this gives 
d 
HTM + inT =vHT 
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OF ince =T'vHT —T"HTy, 

= uA, — Ayu, (11) 
where Heal tH? (12) 
Equation (11) may be written in P.B. notation “ = 05, il. (13) 


Equation (11) or (13) shows how any dynamical variable varies with time in 
the Heisenberg picture and gives us Heisenberg’s form for the equations of motion. 
These equations of motion are determined by the one linear operator H;, which is 
just the transform of the linear operator H occurring in Schrédinger’s form for 
the equations of motion and corresponds to the energy in the Heisenberg picture. 
We shall call the dynamical variables in the Heisenberg picture, where they vary 
with the time, Heisenberg dynamical variables, to distinguish them from 
the fixed dynamical variables of the Schrédinger picture, which we shall call 
Schrédinger dynamical variables. Each Heisenberg dynamical variable is connected 
with the corresponding Schrédinger dynamical variable by equation (10). 
Since this connexion is a unitary transformation, all algebraic and functional 
relationships are the same for both kinds of dynamical variable. We have T' = 1 
for t = to, so that v,, = v and any Heisenberg dynamical variable at time to equals 
the corresponding Schrédinger dynamical variable. 

Equation (13) can be compared with classical mechanics, where we also have 
dynamical variables varying with the time. The equations of motion of classical 


mechanics can be written in the Hamiltonian form 
me ODP 2 20H (14) 
dt = Op, dt Od, 
where the q’s and p’s are a set of canonical coordinates and momenta and H is 
the energy expressed as a function of them and possibly also of t. The energy 
expressed in this way is called the Hamiltonian. Equations (14) give, for v any 


function of the q’s and p’s that does not contain the time t explicitly, 
LE Ov dq, Ov dp, 
dt 4+ | Aq, dt © Op, dt 


7 Fe OH dv a} 


Oqr Op, — OPy O4r 

— [v, A], (15) 
with the classical definition of a P.B., equation (1) of §21. This is of the same 
form as equation (13) in the quantum theory. We thus get an analogy between 
the classical equations of motion in the Hamiltonian form and the quantum 
equations of motion in Heisenberg’s form. This analogy provides a justification 
for the assumption that the linear operator H introduced in the preceding section 
is the energy of the system in quantum mechanics. 
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In classical mechanics a dynamical system is defined mathematically when 
the Hamiltonian is given, i.e. when the energy is given in terms of a set of 
canonical coordinates and momenta, as this is sufficient to fix the equations 
of motion. In quantum mechanics a dynamical system is defined mathematically 
when the energy is given in terms of dynamical variables whose commutation 
relations are known, as this is then sufficient to fix the equations of motion, 
in both Schrédinger’s and Heisenberg’s form. We need to have either H 
expressed in terms of the Schrédinger dynamical variables or H; expressed 
in terms of the corresponding Heisenberg dynamical variables, the functional 
relationship being, of course, the same in both cases. We call the energy expressed 
in this way the Hamiltonian of the dynamical system in quantum mechanics, 
to keep up the analogy with the classical theory. 


A system in quantum mechanics always has a Hamiltonian, whether the system 
is one that has a classical analogue and is describable in terms of canonical 
coordinates and momenta or not. However, if the system does have a classical 
analogue, its connexion with classical mechanics is specially close and one can 
usually assume that the Hamiltonian is the same function of the canonical 
coordinates and momenta in the quantum theory as in the classical theory! 
There would be a difficulty in this, of course, if the classical Hamiltonian involved 
a product of factors whose quantum analogues do not commute, as one would not 
know in which order to put these factors in the quantum Hamiltonian, but this 
does not happen for most of the elementary dynamical systems whose study is 
important for atomic physics. In consequence we are able also largely to use 
the same language for describing dynamical systems in the quantum theory as in 
the classical theory (e.g. to talk about particles with given masses moving through 
given fields of force), and when given a system in classical mechanics, can usually 
give a meaning to ‘the same’ system in quantum mechanics. 


Equation (13) holds for vu, any function of the Heisenberg dynamical variables 
not involving the time explicitly, i.e. for v any constant linear operator in 
the Schrédinger picture. It shows that such a function v; is constant if it commutes 
with H; or if v commutes with H. We then have v%, = uw, = v, and we call uv; or 
va constant of the motion. It is necessary that v shall commute with A at all 
times, which is usually possible only if H is constant. In this case we can substitute 
H for v in (13) and deduce that H; is constant, showing that H itself is then a 
constant of the motion. Thus if the Hamiltonian is constant in the Schrédinger 
picture, it is also constant in the Heisenberg picture. 


tThis assumption is found in practice to be successful only when applied with the dynamical 
coordinates and momenta referring to a Cartesian system of axes and not to more general 
curvilinear coordinates. 
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For an isolated system, a system not acted on by any external forces, there are 
always certain constants of the motion. One of these is the total energy or 
Hamiltonian. Others are provided by the displacement theory of §25. It is 
evident physically that the total energy must remain unchanged if all the dynamical 
variables are displaced in a certain way, so equation (63) of §25 must hold with 
vq = v = H. Thus D commutes with AH and is a constant of the motion. 
Passing to the case of an infinitesimal displacement, we see that the displacement 
operators d,, d, and d, are constants of the motion and hence, from (69) of §25, 
the total momentum is a constant of the motion. Again, the total energy must 
remain unchanged if all the dynamical variables are subjected to a certain rotation. 
This leads, as will be shown in §35, to the result that the total angular momentum 
is a constant of the motion. The laws of conservation of energy, momentum 
and angular momentum hold for an isolated system in the Heisenberg picture in 
quantum mechanics, as they hold in classical mechanics. 

Two forms for the equations of motion of quantum mechanics have now 
been given. Of these, the Schrédinger form is the more useful one for 
practical problems, as it provides the simpler equations. The unknowns in 
Schrédinger’s wave equation are the numbers which form the representative of 
a ket vector, while Heisenberg’s equation of motion for a dynamical variable, 
if expressed in terms of a representation, would involve as unknowns the numbers 
forming the representative of the dynamical variable. The latter are far more 
numerous and therefore more difficult to evaluate than the Schrédinger unknowns. 
Werner Heisenberg’s form for the equations of motion is of value in providing 
an immediate analogy with classical mechanics and enabling one to see how 
various features of classical theory, such as the conservation laws referred to above, 
are translated into quantum theory. 


29. Stationary states 
We shall here deal with a dynamical system whose energy is constant. 
Certain specially simple relations hold for this case. Equation (6) can be 


integrated* to give T =e 18 to) /h 

with the help of the initial condition that T = 1 for t = to. This result substituted 
into (1) gives |Pt) = ee -t0)/* | Pt) (16) 
which is the integral of Schrédinger’s equation of motion (5), and substituted into 
(10) it gives Up = ett (Eto) Pye tH (Eto) /h (17) 


which is the integral of Heisenberg’s equation of motion (11), H; being now equal 
to H. Thus we have solutions of the equations of motion in a simple form. However, 
these solutions are not of much practical value, because of the difficulty involved 


“The integration can be carried out as though H were an ordinary algebraic variable instead 
of a linear operator, because there is no quantity that does not commute with AH in the work. 


29. Stationary states UF 
in evaluating the operator e~‘!~)/" unless H is particularly simple, and for 
practical purposes one usually has tot rely on Schrédinger’s wave equation. 

Let us consider a state of motion such that at time to it is an eigenstate of 
the energy. The ket |Pto) corresponding to it at this time must be an eigenket 
of H. If H’ is the eigenvalue to which it belongs, equation (16) gives 

|Pt) = e tH (t-to)/h |Pto), 

showing that |Pt) differs from |Pto) only by a phase factor. Thus the state 
always remains an eigenstate of the energy, and further, it does not vary with 
the time at all, since the direction of the ket |Pt) does not vary with the time. 
Such a state is called a stationary state. The probability for any particular 
result of an observation on it is independent of the time when the observation 
is made. From our assumption that the energy is an observable, there are sufficient 
stationary states for an arbitrary state to be dependent on them. 

The time-dependent wave function w(£,t) representing a stationary state of 
energy H’ will vary with time according to the law 


W(E,t) = pole" (18) 
and Schrédinger’s wave equation (7) for it reduces to 
H! Wo) = H yo) . (19) 


This equation merely asserts that the state represented by wo is an eigenstate 
of H. We call a function Wo satisfying (19) an eigenfunction of H, belonging to 
the eigenvalue H’. 

In the Heisenberg picture the stationary states correspond to fixed eigenvectors 
of the energy. We can set up a representation in which all the basic vectors 
are eigenvectors of the energy and so correspond to stationary states in 
the Heisenberg picture. We call such a representation a Heisenberg representation. 
The first form of quantum mechanics, discovered by Werner Heisenberg in 1925, 
was in terms of a representation of this kind. The energy is diagonal in 
the representation. Any other diagonal dynamical variable must commute with 
the energy and is therefore a constant of the motion. The problem of setting up 
a Heisenberg representation thus reduces to the problem of finding a complete set of 
commuting observables, each of which is a constant of the motion, and then making 
these observables diagonal. The energy must be a function of these observables, 
from Theorem 2 of 819. It is sometimes convenient to take the energy itself as one 
of them. 

Let a denote the complete set of commuting observables in a Heisenberg 
representation, so that the basic vectors are written (a’|, |a”). The energy is 
a function of these observables a, say H = H(a). From (17) we get 


(a'| v: Ja”) = (a’| cil (t—to)/Rye—tH (t—to)/h Ja”) 


= ei(H’—-H")(t—to)/h (a’| v Ja”), (20) 


t[Original:- ‘fall back’.] 
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where H’ = H(a’) and H” = H(a"). The factor (a’| va”) on the right-hand side 
here is independent of ¢, being an element of the matrix representing the fixed 
linear operator v. Formula (20) shows how the Heisenberg matrix elements of any 
Heisenberg dynamical variable vary with time, and it makes 1; satisfy the equation 
of motion (11), as is easily verified. The variation given by (20) is simply periodic 
with the frequency |H’ — H"| /2cth =|H' — H"|/h, (21) 
depending only on the energy difference of the two stationary states to which 
the matrix element refers. This result is closely connected with the Combination 
Law of Spectroscopy and Bohr’s Frequency Condition, according to which 
(21) is the frequency of the electromagnetic radiation emitted or absorbed 
when the system makes a transition under the influence of radiation between 
the stationary states a’ and a”, the eigenvalues of H being Bohr’s energy levels. 
These matters will be dealt with in §45. 


30. The free particle 


The most fundamental and elementary application of quantum mechanics is to 
the system consisting merely of a free particle, or particle not acted on by 
any forces. For dealing with it we use as dynamical variables the three Cartesian 
coordinates x, y, z and their conjugate momenta pz, py, pz. The Hamiltonian is 
equal to the kinetic energy of the particle, namely 
H = 5—(p, + py + Pz) (22) 
according to Newtonian mechanics, m being the mass. This formula is valid 
only if the velocity of the particle is small compared with c, the velocity of light. 
For a rapidly moving particle, such as we often have to deal with in atomic theory, 
(22) must be replaced by the relativistic formula 
He emo +p + P; + pz) (23) 
For small values of pz, py and p, (23) goes over into (22), except for the constant 
term mc? which corresponds to the rest-energy of the particle in the theory of 
relativity and which has no influence on the equations of motion. Formulae (22) 
and (23) can be taken over directly into the quantum theory, the square root in 
(23) being now understood as the positive square root defined at the end of §11. 
The constant term mc? by which (23) differs from (22) for small values of p,, py and 
p, can still have no physical effects, since the Hamiltonian in the quantum theory, 
as introduced in §27, is undefined to the extent of an arbitrary* real constant. 
We shall here work with the accurate formula (23). We shall first solve 
the Heisenberg equations of motion. From the quantum conditions (9) of §21, 
Pz commutes with p, and p, and hence, from Theorem 1 of §19 extended to a set 
of commuting observables, p; commutes with any function of pz, py and p, and 
therefore with H. It follows that p, is a constant of the motion. Similarly p, and p, 


*l‘additive’ omitted.| 
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are constants of the motion. These results are the same as in the classical theory. 
Again, the equation of motion for a coordinate, x; say, is, according to (11), 

Aen se 

tha, = ih = x,ce(m?c? + po + p> + p2)* — e(m?c? + p2 +p, + p2)tze. 
The right-hand side here can be evaluated by means of formula (31) of §22 with 
the roles of coordinates and momenta interchanged, so that it reads 


Ord _ far = ih Of /Opr, (24) 
f now being any function of the p’s. This gives 


a "Da 
by = a —e(mic? + py + py + P2)* = = 
Pa 2 2 (25) 
Similarly, a ae y= a 


The magnitude of the velocity is 
z F -2\t i 
v= (tp +H + 4) = C(p, + py + p2)?/H. (26) 
Equations (25) and (26) are just the same as in the classical theory. 
Let us consider a state that is an eigenstate of the momenta, belonging to 


belonging to the eigenvalue —-H’ = c(mc? +p? + pi’ + p?)h (27) 
from mc? to oo, as in the classical theory. The wave function 7)(z, y, z) representing 


this state at any time in Schrédinger’s representation must satisfy 
O(2, Y, 2) 


P, W(x, y; z)) = Px W(x, y; z)) = a 5 
with similar equations for p, and p,. These equations show that w(x, y, z) is of 
the form w(x, y, 2) = ae PartPyytp.2)/h (28) 
where a is independent of x, y and z. From (18) we see now that 
the time-dependent wave function w(z, y, z,t) is of the form 
w(x, y,z,t) = aget Part pyytp.2—H't)/h (29) 


where do is independent of x, y, z and t. 

The function (29) of x, y, z and t describes plane waves in space-time. We see 
from this example the suitability of the terms ‘wave function’ and ‘wave equation’ 
The frequency of the waves is ya fh, (30) 
their wavelength is N=h/ (pl? + vi + p/?)t = h/P’ (31) 
P’ being the length of the vector (p',, pi, p), and their motion is in the direction 
specified by the vector (p’,, p!,, p,) with the velocity 

Wea HP acl y, (32) 
v' being the velocity of the particle corresponding to the momentum (p%,, pi, P.) 
as given by formula (26). Equations (30), (31) and (32) are easily seen to hold 
in all Lorentz frames of reference, the expression on the right-hand side of (29) 
being, in fact, relativistically invariant with p/,, p’,, p, and H' as the components 
of a 4-vector. These properties of relativistic invariance led Louis de Broglie, 


de Broglie waves 


group velocity 


wave packet 
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before the discovery of quantum mechanics, to postulate the existence of waves 
of the form (29) associated with the motion of any particle. They are therefore 
known as de Broglie waves. 

In the limiting case when the mass m is made to tend to zero, the classical 
velocity of the particle v becomes equal to c and hence, from (32), the wave velocity 
also becomes c. The waves are then like the light-waves associated with a photon, 
with the difference that they contain no reference to the polarization and involve 
a complex exponential instead of sines and cosines. Formulae (30) and (31) are 
still valid, connecting the frequency of the light-waves with the energy of the photon 
and the wavelength of the light-waves with the momentum of the photon. 

For the state represented by (29), the probability of the particle being found 
in any specified small volume when an observation of its position is made is 
independent of where the volume is. This provides an example of Heisenberg’s 
principle of uncertainty, the state being one for which the momentum is accurately 
given and for which, in consequence, the position is completely unknown. 
Such a state is, of course, a limiting case which never occurs in practice. The states 
usually met with in practice are those represented by wave packets, which may 
be formed by superposing a number of waves of the type (29) belonging to 
slightly different values of (p‘,,p/,, p',), a8 discussed in §24. The ordinary formula 
in hydrodynamics for the velocity of such a wave packet, i.e. the group velocity of 


the waves, is dv 
? — 33 
which gives, from (30) and (31) 
dH’ d 1 CP! 
apr pl (m*c’ + P®)? = ae vu (34) 


This is just the velocity of the particle. The wave packet moves in the same 
direction and with the same velocity as the particle moves in classical mechanics. 


31. The motion of wave packets 
The result just deduced for a free particle is an example of a general 
principle. For any dynamical system with a classical analogue, a state for 
which the classical description is valid as an approximation is represented in 
quantum mechanics by a wave packet, all the coordinates and momenta having 
approximate numerical values, whose accuracy is limited by Heisenberg’s principle 
of uncertainty. Now Schr6édinger’s wave equation fixes how such a wave packet 
varies with time, so in order that the classical description may remain valid, 
the wave packet should remain a wave packet and should move according to 
the laws of classical dynamics. We shall verify that this is so. 

We take dynamical system having a classical analogue and let its Hamiltonian 
be H(q,,pr) (r = 1,2,...,). The corresponding classical dynamical system 
will have as Hamiltonian H,(q,,p,) say, obtained by putting ordinary algebraic 
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variables for the q, and p, in H(q,,p,) and making h — 0 if it occurs in H(q,,p,). 
The classical Hamiltonian H, is, of course, a real function of its variables. It is 
usually a quadratic function of the momenta p,, but not always so, the relativistic 
theory of a free particle being an example where it is not. The following argument 
is valid for H, any algebraic function of the p’s. 

We suppose that the time-dependent wave function in Schréddinger’s 
representation is of the form —w(q, t) = Ae’®/", (35) 
where A and S are real functions of the q’s and t which do not vary very rapidly 
with their arguments. The wave function is then of the form of waves, with A and S 
determining the amplitude and phase respectively. Schrédinger’s wave equation 


7) gi O.., 
( ) ae itis el" ) = H (ar, pr) Aes!* ) 
OA OS , 
or : _A — —iS/h py A iS/h : 
{in Ot Ot \ ) € (dr, Pr) € ) (36) 
Now e~*/" is evidently a unitary linear operator and may be used for U in 


equation (70) of §26 to give us a unitary transformation. The q’s remain unchanged 
by this transformation, each p, goes over into 
e *5/Pp eS — yp, + O3/ OG, 

with the help of (31) of §22, and H goes over into 

e 8/7 A (q,, ppel = H (dr, pr + 05/0¢,) 
since algebraic relations are preserved by the transformation. Thus (36) becomes 

OA 

{i - -aZ} =H (are 5 | A). (37) 
Let us now suppose that A can be counted as small and let us neglect terms 
involving f in (37). This involves neglecting the p,’s that occur in H in (87), 
since each p, is equivalent to the operator —ihO/Oq, operating on the functions of 
the q’s to the right of it. The surviving terms give 


Os OS 


This is a differential equation which the phase function S has to satisfy. 
The equation is determined by the classical Hamiltonian function H, and is known 
as the Hamilton-Jacobi equation in classical dynamics. It allows S to be real and so 
shows that the assumption of the wave form (35) does not lead to an inconsistency. 

To obtain an equation for A, we must retain the terms in (37) which are linear 
in h and see what they give. A direct evaluation of these terms is rather awkward 
in the case of a general function H, and we can get the result we require more 
easily by first multiplying both sides of (37) by the bra vector (Af, where f is 
an arbitrary real function of the q’s. This gives 


OA Oo. 05 
(Af Ge - ast ) = (Af (np n x) A). 


Hamilton-Jacobi 
equation 
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The conjugate complex equation is 
OA Os Os 
A ih—-—A =(A H| q, pr + — }) fA). 
(ar {m2 aby = (4 (ape + 3) Ba) 
Subtracting the second from the first above and dividing out by 7h, we obtain 


2(af 2) = (4 [i (anr, + x) 4). (39) 


We now have to evaluate the P.B. [f, 1 (ar, Pr + OS/0q,)]. 
Our assumption that fh can be counted as small enables us to expand 
H (qr; Pr + OS/Oq,) a8 a power series in the p’s. The terms of zero degree will 
contribute nothing to the P.B. The terms of the first degree in the p’s give 
a contribution to the P.B. which can be evaluated most easily with the help of 
the classical formula (1) of §21 (this formula being valid also in the quantum 
theory if wu is independent of aH p’s and v is linear in the p’s). The amount of 
this contribution is a ae. do) 

04s OPs |». =08/dar 
the notation meaning that we must substitute 0.5/0q, for each p, in the function 
| | of the q’s and p’s, so as to obtain a function of the q’s only. The terms of higher 
degree in the p’s give contributions to the P.B. which vanish when h — 0. Thus (39) 
becomes, with ge oi terms eae h, which is equivalent to the neglect of 

2: 
h° in (37), (pe *)- (A A? | ble.) : (40) 
Ps pr=08/Oqr 

Now if a(q) and b(q) are any two ans of the q’s, formula (64) of §20 gives 


(a(q) 6(q)) = : a(q’) dd’ b(q'), 


and so (a(q) ae = —( ae b(q)), (41) 


provided a(q) and 6(q) satisfy suitable boundary conditions, as discussed in §822 
and 23. Hence (40) may be eae 


on 2 | Oca, Pr) 
( mec Fd o6, 04s ao | Ops | own} : 


Since this holds for an Sea real function f, we must have 


— a Ss 0 A? et ; (42) 
4s OPs pr=08 /Oqr 


This is the equation for the amplitude A of the wave function. To get 
an understanding of its significance, let us suppose we have a fluid moving in 
the space of the variables q, the density of the fluid at any point and time being 
A’ and its velocity being dg, [Alar 


dt Ops | pr=08/ ae 


d 


(43) 
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Equation (42) is then just the equation of conservation for such a fluid. The motion 
of the fluid is determined by the function S satisfying (38), there being one possible 
motion for each solution of (38). 

For a given S, let us take a solution of (42) for which at some definite time 
the density A? vanishes everywhere outside a certain small region. We may suppose 
this region to move with the fluid, its velocity at each point being given by (43), 
and then the equation of conservation (42) will require the density always to vanish 
outside the region. There is a limit to how small the region may be, imposed by 
the approximation we made in neglecting h in (39). This approximation is valid 


only provided ae a cee 
Or 04, 
10A 10S 


or 


Ada, ~ hg,’ 

which requires that A shall vary by an appreciable fraction of itself only through 
a range of the q’s in which S varies by many times fh, i.e. a range consisting of 
many wavelengths of the wave function (35). Our solution is then a wave packet 
of the type discussed in §24 and remains so for all time. 

We thus get a wave function representing a state of motion for which 
the coordinates and momenta have approximate numerical values throughout 
all time. Such a state of motion in quantum theory corresponds to the states 
with which classical theory deals. The motion of our wave packet is determined 
by equations (38) and (43). From these we get, defining p, as 0S'/0q,, 
dp,  d 0S oS. OS: dai, 
dt dt qs ~ OtOd, Oqu0qs at 


O eae} OH. ( (api) 
7 fogs (a5 Od ) + d, Oqu0ds — ODu 


a CHNG Pe) (44) 
04s 

where in the last line the p’s are counted as independent of the q’s before the partial 
differentiation. Equations (43) and (44) are just the classical equations of motion 
in Hamiltonian form and show that the wave packet moves according to the laws 
of classical mechanics. We see in this way how the classical equations of motion 

are derivable from the quantum theory as a limiting case. 
By a more accurate solution of the wave equation one can show that 
the accuracy with which the coordinates and momenta simultaneously have 
numerical values cannot remain permanently as favourable as the limit allowed 
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by Heisenberg’s principle of uncertainty, equation (56) of §24, but if it is initially 
so it will become less favourable, the wave packet undergoing a spreading! 


32. The action principle’ 

Equation (10) shows that the Heisenberg dynamical variables at time ¢, v;, are 
connected with their values at time to, vj, or v, by a unitary transformation. 
The Heisenberg variables at time t + dt are connected with their values at time t 
by an infinitesimal unitary transformation, as is shown by the equation of motion 
(11) or (13), which gives the connexion between v;+5; and v; of the form of (79) or 
(80) of §26 with H, for F and 6t/h for «. The variation with time of the Heisenberg 
dynamical variables may thus be looked upon as the continuous unfolding of 
a unitary transformation. In classical mechanics the dynamical variables at time 
t + ot are connected with their values at time t by an infinitesimal contact 
transformation and the whole motion may be looked upon as the continuous 
unfolding of a contact transformation. We have here the mathematical foundation 
of the analogy between the classical and quantum equations of motion, and can 
develop it to bring out the quantum analogue of all the main features of the classical 
theory of dynamics. 

Suppose we have a representation in which the complete set of commuting 
observables € are diagonal, so that a basic bra is (€|. We can introduce a second 
representation in which the basic bras are (£*| = (€"| T. (45) 
The new basic bras depend on the time ¢t and give us a moving representation, 
like a moving system of axes in an ordinary vector space. Comparing (45) 
with the conjugate imaginary of (8), we see that the new basic vectors are 
just the transforms in the Heisenberg picture of the original basic vectors in 
the Schrédinger picture, and hence they must be connected with the Heisenberg 
dynamical variables v1; in the same way in which the original basic vectors are 
connected with the Schrédinger dynamical variables v. In particular, each ("| 
must be an eigenvector of the €/s belonging to the eigenvalues €. It may 
therefore be written (¢/|, with the understanding that the numbers €; are the same 
eigenvalues of the &/s that the €’’s are of the €’s. From (45) we get 

|e") = (ITIL, (46) 
showing that the transformation function is just the representative of JT’ in 
the original representation. 

Differentiating (45) with respect to t and using (6), we get 

ins (El = sh (ES = CHT = (Ei, 

tSee Kennard, E. H., ,,Zur Quantenmechanik einfacher Bewegungstypen“ Zeitschrift fur 
Physik (1927), 44(4-5), pp. 326-352, [doi: 10.1007/BF01391200 |; and Darwin, Charles Galton, 
“Free motion in the wave mechanics” Proceedings of the Royal Society of London A, 117 (1927), 
pp. 258-293 [doi: 10.1098/rspa.1927.0179 | 


*This section may be omitted by the student who is not specially concerned with higher 
dynamics. 
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with the help of (12). Multiplying on the right by any ket |a) independent of t, 


WE: get Peer ! ! " Wopen 
the, Abe | a) = bel He la) = [ime de (&: | a); (47) 


if we take for definiteness the case of continuous eigenvalues for the €’s. 
Now equation (5), written in terms of representatives, reads 


inc (6 | Pt) = f LE") ae” (e" | PA). (48) 


Since (&| H; |&’) is the same function of the variables € and &’ that (&'| H |€”) is of 
€’ and €” equations (47) and (48) are of precisely the same form, with the variables 
€ and &/ in (47) playing the role of the variables €’ and €” in (48) and the function 
(€ | a) playing the role of the function (€’ | Pt). We can thus look upon (47) as 
a form of Schrédinger’s wave equation, with the function (€ | a) of the variables & 
as the wave function. In this way Schrddinger’s wave equation appears in 
a new light, as the condition on the representative, in the moving representation 
with the Heisenberg variables & diagonal, of the fixed ket corresponding to a state 
in the Heisenberg picture. The function (€) | a) owes its variation with time to 
its left factor (€/|, in contradistinction to the function (€' | Pt), which owes its 
variation with time to its right factor | Pt). 


If we put |a) = |€”) in (47), we get 
° d / W\ / Mr Mr MW MW 
ame (ELE) = f (GL HeletYY a6" (6 6"), (49) 


showing that the transformation function (€ | €’) satisfies Schrédinger’s wave 
equation. Now &, =€ so we must have —(€, | €”) = 6(&, — €"), (50) 
the 6 function here being understood as the product of a number of factors, one for 
each €-variable, such as occurs for the variables €,41,...,&, on the right-hand 
side of equation (34) of §16. Thus the transformation function (& | €”) is that 
solution of Schrédinger’s wave equation for which the €’s certainly have the values 
&" at time to. The square of its modulus, |(€ | é")? is the relative probability 
of the é’s having the values €/ at time t > to if they certainly have the values €” 
at time to. We may write (& | é") as e | e.) and consider it as depending on 
to as well as on t. To get its dependence on to we take the conjugate complex 
of equation (49), interchange t and tp) and also interchange single primes and 
double primes. This gives 


° d / NV / WW MW Mm NW 
re (& | &) = fe | ae d&,, (Ef | Hig ee (51) 


The foregoing discussion of the transformation function (& | €”) is valid with 
the €’s any complete set of commuting observables. The equations were written 
down for the case of the €’s having continuous eigenvalues, but they would still 
be valid if any of the €’s have discrete eigenvalues, provided the necessary formal 


action 
Lagrangian 
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changes are made in them. Let us now take a dynamical system having a classical 
analogue and let us take the €’s to be the coordinates gq. Put 
(gi | a") =e" (52) 
and so define the function S' of the variables q¢, & q’. This function also depends 
explicitly on t. (52) is a solution of Schrédinger’s wave equation and, if h can be 
counted as small, it can be handled in the same way as (35) was. The S' of (52) 
differs from the S of (35) on account of there being no A in (52), which makes 
the S of (52) complex, but the real part of this S equals the S of (35) and its? 
imaginary part is of the order h. Thus, in the limit h — 0, the S of (52) will equal 
that of (35) and will therefore satisfy, corresponding to (38) 
—0S'/dt = Hel Gres Pre)» (53) 
where Dg POS1 Od. (54) 
and H, is the Hamiltonian of the classical analogue of our quantum 
dynamical system. But (52) is also a solution of (51) with q’s for €’s, which is 
the conjugate complex of Schrédinger’s wave equation in the variables q" or qj. 


This causes S to satisfy also* 09/0to = H.(q), p’), (55) 
where p. = —0S/dq". (56) 


The solution of the Hamilton-Jacobi equations (53) & (55) is the action function 
of classical mechanics for the time interval to to t, Le. it is the time integral of 


the Lagrangian L, s= | Le)at! (57) 


Thus the S defined by (52) is the aaah in analogue of the classical action function 
and equals it in the imit h > 0. To get the quantum analogue of the classical 
Lagrangian, we pass to the case of an infinitesimal time interval by putting 
t = to + ot and we then have (4, , 5, | qj.) as the analogue of e*“)'/"_ For the sake 
of the analogy, one should consider L(t) as a function of the coordinates q’ at time 
to+6t and the coordinates q” at time to rather than as a function of the coordinates 
and velocities at time to as one usually does. 

The principle of least action in classical mechanics says that the action function 
(57) remains stationary for small variations of the trajectory of the system which 
do not alter the end points, i.e. for small variations of the q’s at all intermediate 
times between tp and ¢t with q@, and q fixed. Let us see what it corresponds to in 
the quantum theory. 


+[*pure’ omitted. 

*For a more accurate comparison of transformation functions with classical theory, see 
Van Vleck, J. H., “The Correspondence Principle in the Statistical Interpretation of Quantum 
Mechanics” Proceedings of the National Academy of Sciences 14(2), pp. 178-188, (1928) 
[doi: 10.1073/pnas.14.2.178 |. 
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ty 


Put exp {i/ L(t) dt/he = exp{iS (ty, ta)/h} = Blt, ta), (58) 


so that B(t», ta) catresponde to (qj, | q,,) im the quantum theory. (We here 
allow q,, and q, to denote different eigenvalues of q@, and q@,, to save having 
to introduce a large number of primes into the analysis.) Now suppose 
the time interval t) — t to be divided up into a large number of small time 


intervals to > t1,t; > to,...,tm—-1 24 tm, tm — t by the introduction of a sequence 
of intermediate times ft), t2,...,tm. Then 
Bi. to) = Bie, th BUb ai tm—1) eee B(ta, t,) Bt, to). (59) 


The corresponding quantum equation, which follows from the property of basic 
vectors (35) of §16, is 


(alas)= fo fo fala ddin (Gin |Um—1) €%m—1 «+» (41%) aah (ai 96), (60) 


q, being written for qj, for brevity. At first sight there does not seem to be any close 
correspondence between (59) and (60). We must, however, analyse the meaning of 
(59) rather more carefully. We must regard each factor B as a function of the q’s 
at the two ends of the time interval to which it refers. This makes the right-hand 
side of (59) a function, not only of q and q, but also of all the intermediate q’s. 
Equation (59) is valid only when we substitute for the intermediate q’s in its 
right-hand side their values for the real trajectory, small variations in which values 
leave S stationary and therefore also, from (58), leave B(t,to) stationary. It is 
the process of substituting these values for the intermediate q’s which corresponds 
to the integrations over all values for the intermediate q’’s in (60). The quantum 
analogue of the action principle is thus absorbed in the composition law (60) 
and the classical requirement that the values of the intermediate q’s shall make 
S stationary corresponds to the condition in quantum mechanics that all values 
of the intermediate q/’s are important in proportion to their contribution to 
the integral in (60). 

Let us see how (59) can be a limiting case of (60) for h small. We must suppose 
the integrand in (60) to be of the form e’”/", where F is a function of qj, qi, q,---; 
d;,, Y which remains continuous as h tends to zero, so that the integrand is a rapidly 
oscillating function when fh is small. The integral of such a rapidly oscillating 
function will be extremely small, except for the contribution arising from a region 
in the domain of integration where comparatively large variations in the qj, produce 
only very small variations in F. Such a region must be the neighbourhood of 
a point where F’ is stationary for small variations of the q,. Thus the integral 
in (60) is determined essentially by the value of the integrand at a point where 
the integrand is stationary for small variations of the intermediate q’’s, and so (60) 
goes over into (59). 

Equations (54) and (56) express that the variables gq; & p/, are connected with 
the variables q” & p” by a contact transformation and are one of the standard 


well-ordered 
function 
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forms of writing the equations of a contact transformation. There is an analogous 
form for writing the equations of a unitary transformation in quantum mechanics. 
We get from (52), with the help of (45) of §22, 


OS(%, 7") 
/ : MN — h / MW = t? / " . 61 
(91| Prela’) = —ia (ae | a") ag, "la? (61) 
Similarly, with the help of (46) of §22, 
Pass) OS(q, 9") 
(cal Pe la") = tha (ae |") =~ ah |). (62) 
From the general definition of functions of commuting observables, we have 
(Gl f(a) 9(@) Ia") = Flag(a") (ax |") (63) 


where f(q) and g(q) are functions of the q/s and q’s respectively. Let G(q,¢q) 
be any function of the q/s and q’s consisting of a sum or integral of terms each 
of the form f(q:)g(q), so that all the q/s in G occur to the left of all the q’s. 
Such a function we call well ordered. Applying (63) to each of the terms in G and 
adding or integrating, we get = (q,| G(ae, 9) ld") = Gai, 0") (@ | 9"). 
Now let us suppose each p,;, and p, can be expressed as a well-ordered function of 
the q/s and q’s and write these functions p,+(q, 7), Pr(@, q). Putting these functions 
for G, we get (| Pre la") = Pel a") (% |), 

(q%| Pr la") = pCa, @”) (a | a”). 
Comparing these equations with (61) and (62) respectively, we see that 

OS(%, 9") OS(%, 9") 


PrilG, d’) = ee Pr(q, d") = ql 
OS (a, OS (a; 

This means that Prt = OSG 4d) Pr = 0D \Git) a) (64) 
OGrt O"r 


provided the right-hand sides of (64) are written as well-ordered functions. 

These equations are of the same form as (54) and (56), but refer to 
the non-commuting quantum variables q, & q instead of the ordinary algebraic 
variables g, & q’”. They show how the conditions for a unitary transformation 
between quantum variables are analogous to the conditions for a contact 
transformation between classical variables. The analogy is not complete, however, 
because the classical S must be real and there is no simple condition corresponding 
to this for the S of (64). 


33. The Gibbs ensemble 


In our work up to the present we have been assuming all along that 
our dynamical system at each instant of time is in a definite state, that is 
to say, its motion is specified as completely and accurately as is possible without 
conflicting with the general principles of the theory. In the classical theory 
this would mean, of course, that all the coordinates and momenta have specified 
values. Now we may be interested in a motion which is specified to a lesser extent 
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than this maximum possible. The present section will be devoted to the methods 
to be used in such a case. 

The procedure in classical mechanics is to introduce what is called 
a Gibbs ensemble, the idea of which is as_ follows. We consider all 
the dynamical coordinates and momenta as Cartesian coordinates in a certain 
space, the phase space, whose number of dimensions is twice the number of degrees 
of freedom of the system. Any state of the system can then be represented by 
a point in this space. This point will move according to the classical equations of 
motion (14). Suppose, now, that we are not given that the system is in a definite 
state at any time, but only that it is in one or other of a number of possible states 
according to a definite probability law. We should then be able to represent it by 
a fluid in the phase space, the mass of fluid in any volume of the phase space 
being the total probability of the system being in any state whose representative 
point lies in that volume. Each particle of the fluid will be moving according 
to the equations of motion (14). If we introduce the density p of the fluid at 
any point, equal to the probability per unit volume of phase space of the system 
being in the neighbourhood of the corresponding state, we shall have the equation 


of conservation Op O dq. ; O dp, 
Ot i ea) rae (ot 

ae 3 O OH 7 0 OH 

~ £4 aq. Op, ] ~ Op, \P Aa; 


= —[p, H]. (65) 
This may be considered as the equation of motion for the fluid, since it determines 
the density p for all time if p is given initially as a function of the q’s and p’s. 
It is, apart from the minus sign, of the same form as the ordinary equation of 
motion (15) for a dynamical variable. 
The requirement that the total probability of the system being in any state 
shall be unity gives us a normalizing condition for p 


[oad dp =1, (66) 


the integration being over the whole of phase space and the single differential dq 
or dp being written to denote the product of all the dq’s or dp’s. If 6 denotes any 
function of the dynamical variables, the average value of 3 will be 


| %¢ dq dp. (67) 


It makes only a trivial alteration in the theory, but often facilitates discussion, 
if we work with a density p differing from the above one by a positive constant 
factor, k say, so that we have instead of (66) 


[fda ap =k. 


Gibbs ensemble 


phase space 
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With this density we can picture the fluid as representing a number k of similar 
dynamical systems, all following through their motions independently in the same 
place, without any mutual disturbance or interaction. The density at any point 
would then be the probable or average number of systems in the neighbourhood 
of any state per unit volume of phase space, and expression (67) would give 
the average total value of 6 for all the systems. Such a set of dynamical systems, 
which is the ensemble introduced by Josiah Willard Gibbs, is usually not realizable 
in practice, except as a rough approximation, but it forms all the same a useful 
theoretical abstraction. 

We shall now see that there exists a corresponding density p in quantum 
mechanics, having properties analogous to the above. It was first introduced by 
John von Neumann. Its existence is rather surprising in view of the fact that 
phase space has no meaning in quantum mechanics, there being no possibility of 
assigning numerical values simultaneously to the q’s and p’s. 

We consider a dynamical system which is at a certain time in one or other of 
a number of possible states according to some given probability law. These states 
may be either a discrete set or a continuous range, or both together. We shall 
here take for definiteness the case of a discrete set and suppose them labelled by 
a parameter m. Let the normalized ket vectors corresponding to them be |m) and 
let the probability of the system being in the mth state be P,,. We then define 


the quantum density p by i= S- |m) Pin (m|. (68) 
Let p’ be any eigenvalue of p and | EN an eigenket belonging to this eigenvalue. 
Then Y= |m) Pm (m | p') = plo’) = p' |p’) 

so that S> (po! | m) Pm (m | p') = 0! (0! |p’) 

or S> Pal(m |’) = 0 (e' |p’). 


Now Pn, being a probability, can never be negative. It follows that p’ cannot 
be negative. Thus p has no negative eigenvalues, in analogy with the fact that 
the classical density p is never negative. 

Let us now obtain the equation of motion for our quantum p. In Schrédinger’s 
picture the kets and bras in (68) will vary with the time in accordance with 
Schrédinger’s equation (5) and the conjugate imaginary of this equation, while 
the P,,’s will remain constant, since the system, so long as it is left undisturbed, 
cannot change over from a state corresponding to one ket satisfying Schrédinger’s 
equation to a state corresponding to another. We thus have 
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roe an ees?) d(m| 


= Hp — pH. (69) 
This is the quantum analogue of the classical equation of motion (65). 
Our quantum p, like the classical one, is determined for all time if it is 
given initially. 

From the assumption of 812, the average value of any observable 3 when 
the system is in the state m is (m|@|m). Hence if the system is distributed 
over the various states m according to the probability law P,,, the average value 
of 6 will be $°, Pin (m| 8 |m). If we introduce a representation with a discrete set 
of basic ket vectors |’) say, this equals 
do Pr (mr |) (E18 bm) =D) (EB mn) Pra (rm | €) 


m, fae 


g/ 
=> El Bele) = DEL IE), (70) 
ef é 

the last step being easily verified with the law of matrix multiplication, 
equation (44) of §17. The expressions (70) are the analogue of the expression (67) 
of the classical theory. Whereas in the classical theory we have to multiply 6 by p 
and take the integral of the product over all phase space, in the quantum theory 
we have to multiply 6 by p, with the factors in either order, and take the diagonal 
sum of the product in a representation. If the representation involves a continuous 
range of basic vectors |&"), we get instead of (70) 


i, (é'| Bole’) de = / (é'| pB le’) dé (71) 


so that we must carry through a process of ‘integrating along the diagonal’ instead 

of summing the diagonal elements. We shall define (71) to be the diagonal 

sum of Gp in the continuous case. It can easily be verified, from the properties 

of transformation functions (56) of §18, that the diagonal sum is the same for 

all representations. 

From the condition that the |m)’s are normalized we get, with discrete €’’s 

dE lele) = do | m) Palm |) = D0 Pm = 1, (72) 
é g/,m m 

since the total probability of the system being in any state is unity. This is 

the analogue of equation (66). The probability of the system being in the state 

€', or the probability of the observables € which are diagonal in the representation 

having the values €/, is, according to the rule for interpreting representatives of 


kets (51) of §18, SE | m)|?Pm = (El ple), (73) 
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which gives us a meaning for each term in the sum on the left-hand side of (72). 
For continuous €’’s, the right-hand side of (73) gives the probability of the €’s 
having values in the neighbourhood of €’ per unit range of variation of the values €’. 

As in the classical theory, we may take a density equal to k times the above p 
and consider it as representing a Gibbs ensemble of & similar dynamical systems, 
between which there is no mutual disturbance or interaction. We shall then have k 
on the right-hand side of (72), and (70) or (71) will give the total average ( for all 
the members of the ensemble, while (73) will give the total probability of a member 
of the ensemble having values for its €’s equal to €’ or in the neighbourhood of & 
per unit range of variation of the values €’. 

An important application of the Gibbs ensemble is to a dynamical system 
in thermodynamic equilibrium with its surroundings at a given temperature T. 
Gibbs showed that such a system is represented in classical mechanics by 
the density p= ce H/T (74) 
HT being the Hamiltonian, which is now independent of the time, k being 
Boltzmann’s constant, and c being a number chosen to make the normalizing 
condition (66) hold. This formula may be taken over unchanged into 
the quantum theory. At high temperatures, (74) becomes p = c, which gives, 
on being substituted into the right-hand side of (73), c(é’ | €’) =c in the case 
of discrete €’’s. This shows that at high temperatures all discrete states are 
equally probable. 
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34. The harmonic oscillator 
A SIMPLE and interesting example of a dynamical system in quantum mechanics 
is the harmonic oscillator. This example is of importance for general theory, 
because it forms a corner-stone in the theory of radiation. The dynamical variables 
needed for describing the system are just one coordinate q and its conjugate 
momentum p. The Hamiltonian in classical mechanics is 
1 2 2.2 22 
H = = (p+ mur’), (1) 

where m is the mass of the oscillating particle and w is 27 times the frequency. 
We assume the same Hamiltonian in quantum mechanics. This Hamiltonian, 
together with the quantum condition (10) of §22, define the system completely. 

The Heisenberg equations of motion are 


a = (a, H] = p/m, (2) 
Pt = |p, H] = —mw qh. 
It is convenient to introduce the dimensionless complex dynamical variable 


7 = (2mAw)*(p + img). (3) 
The equations of motion (2) give ty = (2mhw)~?(—mw?q@, + iwp,) = iw. 
This equation can be integrated to give m = noe”, (4) 


where 7 is a linear operator independent of t, and is equal to the value of 7 at 
time t = 0. The above equations are all as in the classical theory. 
We can express g and p in terms of 7 and its conjugate complex 7 and may 
thus work entirely in terms of 7 and 7. We have 
hunt] = (2m)~*(p + imwg) (p — imwg) 
= (2m)~"[p? + m?w*q? + imw(gp — pq) 


= H — thw (5) 
and similarly hun = H + ghw. (6) 
Thus 7 — nH = 1. (7) 


Equation (5) or (6) gives H in terms of 7 and 7 and (7) gives the commutation 
relation connecting 7 and 7. From (5) hwyny = 7H — shw7 


and from (6) hun = HA + dhw7. 
Thus nH — Hr = hun. (8) 
Also, (7) leads to An” — "7 = nn” (9) 
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for any positive integer n, as may be verified by induction, since, by multiplying 
(9) by 7 on the left, we can deduce (9) with n + 1 for n. 
Let H’ be an eigenvalue of H and |H’) an eigenket belonging to it. From (5) 
hw ("| nq) |) = (H'| H — ghw |H") = (H" — $hw) (A | #1’). 
Now (H'|n7|H') is the square of the length of the ket 7|H’), and hence 
(H"| nq |") 2 0, 

the case of equality occurring only if 7|H’) = 0. Also (H’ | H’) > 0. Thus 

H! > thw, (10) 
the case of equality occurring only if 7|H’) = 0. From the form (1) of H as 
a sum of squares, we should expect its eigenvalues to be all positive or zero 
(since the average value of H for any state must be positive or zero). We now 
have the more stringent condition (10). 

From (8) Hiy|H") = (7H — fw) |H") = (H’ — hee) |H. (11) 
Now if H’ 4 4hw, 7|H’) is not zero and is then according to (11) an eigenket 
of H belonging to the eigenvalue H’—hw. Thus, with H’ any eigenvalue of 
H not equal to 4hw, H’— hw is another eigenvalue of H. We can repeat 
the argument and infer that, if H’ — hw 4 4hw, H’ — 2hw is another eigenvalue 
of H. Continuing in this way, we obtain the series of eigenvalues H’, H' — hw, 
H!' — 2hw, H’ — 3hw,..., which cannot extend to infinity, because then it would 
contain eigenvalues contradicting (10), and can terminate only with the value $hw. 
Again, from the conjugate complex of equation (8) 

Hn |H") = (nH + hun) |H") = (H" + hw)n|H’), 

showing that H’ + hw is another eigenvalue of H, with 7|H’) as an eigenket 
belonging to it, unless 7|H’)=0. The latter alternative can be ruled out, 
since it would lead to 0= hw7n|H’) = (H +4hw)|A"’) = (H’+4hw)|H’), 
which contradicts (10). Thus H’+ hw is always another eigenvalue of H, and so 
are H’ + 2hw, H’ + 3hw and so on. Hence the eigenvalues of H are the sequence* 
of numbers S hw, 3 hw, > hw), thw, sri (12) 
extending to infinity. These are the possible energy values for 
the harmonic oscillator. 

Let |0) be an eigenket of H belonging to the lowest eigenvalue 4hw, so that 


70) = 0, (13) 
and form the sequence of kets |0), 7|0), 7°|0), 7°*|0), .... (14) 
These kets are all eigenkets of H, belonging to the sequence of eigenvalues (12) 
respectively. From (9) and (13) 7n” |0) = nn”—* |0) (15) 


for any non-negative integer n. Thus the set of kets (14) is such that 7 or 7 applied 
to any one of the set gives a ket dependent on the set. Now all the dynamical 
variables in our problem are expressible in terms of 7 and 7, so the kets (14) must 


*l‘sequence’ replaces ‘series’ .| 
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form a complete set (otherwise there would be some more dynamical variables). 
There is just one of these kets for each eigenvalue (12) of H, so H by itself forms 
a complete commuting set of observables. The kets (14) correspond to the various 
stationary states of the oscillator. The stationary state with energy (n + 4)hw, 
corresponding to 7" |0), is called the nth quantum state. 

The square of the length of the ket 7" |0) is (0|%"” |0) = n (0| 7" !”"1 JO) 
with the help of (15). By induction, we find that (0|7"”" ]0) =n! (16) 
provided |0) is normalized. Thus the kets (14) multiplied by the coefficients 
nl? with n = 0, 1, 2,..., respectively form the basic kets of a representation, 
namely the representation with H diagonal. Any ket |x) can be expanded in 


the form! o© 
|v) = yaa” |9), (17) 
n=0 


where the z,’s are numbers. In this way the ket |x) is put into correspondence 
with a power series 5> x,7" in the variable 7, the various terms in the power series 
corresponding to the various stationary states. If |”) is normalized, it defines 
a state for which the probability of the oscillator being in the nth quantum state, 
i.e. the probability of H having the value (n+4)hw, is P, = nl |x|? (18) 
as follows from the same argument which led to (51) of §18. 

We may consider the ket |0) as a standard ket and the power series in 7 
as a wave function, since any ket can be expressed as such a wave function 
multiplied into this standard ket. The present kind of wave function differs from 
the usual kind, introduced by equations (62) of §20, in that it is a function of 
the complex dynamical variable 7 instead of observables. It is, however, for many 
purposes the most convenient wave function for describing states of the harmonic 
oscillator. The standard ket |0) satisfies the condition (13), which replaces 
the conditions (43) of §22 for the standard ket in Schrédinger’s representation.’ 

Let us introduce Schrédinger’s representation with q diagonal and obtain 
the representatives of the stationary states. From (13) and (3) 

(p — imwg) |0) = 0, 
SO (q'|p —imwg|0) = 0. 


With the help of (45) of §22, this gi 0 = 
i e help of (45) of §22, this gives haga (al |0) + mud (a! | 0) = 0. (19) 


The solution of this differential equation is (q' | 0) = (mw/mh)4e-™2"/?"_ (20) 


the numerical coefficient being chosen so as to make |0) normalized. We have 
here the representative of the normal state, as the state of lowest energy is called. 
The representatives of the other stationary states can be obtained from it. We have 
from (3) 


t[The index of summation n is stated.] 
8[The paragraph is redrafted in the fourth edition.] 


normal state 
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(| n” |0) = (2mhw)~””? (q'| (p + imwg)” |0) 


= (2mfiw)-"/4" (-a5 + mt) (qd | 0) 


= i" (2mhw)-"/? (mw / wh)? (=n + mu) gers (21) 


This may easily be worked out for small values of n. The result is of the form of 
e-mea/2h times a power series of degree n in q’. A further factor n!~? must be 
inserted in (21) to get the normalized representative of the nth quantum state. 
The factor 7” may be discarded, being merely a phase factor. 


35. Angular momentum 
Let us consider a particle described by the three Cartesian coordinates x, y, z and 
their conjugate momenta pz, py, pz. Its angular momentum about the origin is 
defined as in the classical theory, by 

My = YPz — Py, My = Px —UPz, Mz = LPy — YPr, (22) 
or by the vector equation m=xxXp. 
We must evaluate the P.B.s of the angular momentum components with 
the dynamical variables x, p,, etc., and with each other. This we can do most 
conveniently with the help of the laws (4) and (5) of §21, thus 


[mz, 2] = [xpy — ype, x] = —yl[pz, x] = y, 
[m2,y] = [zpy — yps, y] = zlpy,y] = —2, 


(23) 
[mz, 2] = [zpy — ype, z] = 0, (24) 
and similarly i Dal Ses Hi Py | = pss (25) 
[mz, pz] = 0, (26) 
with corresponding relations for mz and my. Again 
[my, mz] = [2px — rpz,mMz] = 2[pz,mz] — [v,m_]pz 
A -+- Po — 05 
Py + YP m (27) 
Ma ite| My Ia es 


These results are all the same as in the classical theory. The sign in the results (23), 
(25) and (27) may easily be remembered from the rule that the + sign occurs when 
the three dynamical variables, consisting of the two in the P.B. on the left-hand 
side and the one forming the result on the right, are in the cyclic order (xyz) and 
the — sign occurs otherwise. Equations (27) may be put in the vector form 
m X m= ihm. (28) 
Now suppose we have several particles with angular momenta m,, Mg,.... 
Each of these angular momentum vectors will satisfy (28), thus 
m, X m, = 7hm,., 
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and any one of them will commute* with any other, so that 
m,xm,+m,xm,=0 (rs). 
Hence if M = Ss" m,., is the total angular momentum, 


MxM=) > m, xm, = 5>m, x m,+ 5 (m, x m, +m, x m,) 


r,s r<s 


=ih> > m, =ihM. (29) 


This result is of the same form as (28), so that the components of the total angular 
momentum M of any number of particles satisfy the same commutation relations 
as those of the angular momentum of a single particle. 

Let A,, Ay, A, denote the three coordinates of any one of the particles, or else 
the three components of momentum of one of the particles. The A’s will commute 
with the angular momenta of the other particles, and hence from (23), (24), (25) 
and (26) [Mi Ay|— AL. -[MogAyt=—Ay, [MA =%. (30) 
If B,, By, Bz are a second set of three quantities denoting the coordinates or 
momentum components of one of the particles, they will satisfy similar relations 
to (30). We shall then have 

[M., A,B, + AyB, + A,B] 
=[M., As]Bs + AelM., Be] + (Mz, AylB, + AylMz, By 
= A,B, + A,B, — A,B, — AyBy 
==(), 
Thus the scalar product A,B, + A,B, + A,B, commutes with M,, and similarly 


with M, and M,. Introduce the vector product 
AxB=C 


or A,B,—A,B,=C,, A,B,-A,B,=C,, A,By— A,B = Ce. 
We have [M,¢Gz) = =A, B+ A,B, = C, 
and similarly [M,,C,] =—-C,, [M.,C,] = 0. 


These equations are again of the form (30), with C for A. We can conclude from 
this work that equations of the form (30) hold for the three components of any 
vector that we can construct from our dynamical variables, and that any scalar 
commutes with M. 

We can introduce linear operators R referring to rotations about the origin in 
the same way in which we introduced the linear operators D in §25 referring to 
displacements. Taking a rotation through an angle 6¢ about the z-axis and making 
O@ infinitesimal, we can obtain the limit operator corresponding to (64) of §25, 

jim (R — 1)/9¢, 
which we shall call the rotation operator about the z-axis and denote by r,. rotation operator 


“This ‘vector product’ anticommutes. The commutability is with respect to P.B.s.] 
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Like the displacement operators, r, is? an imaginary linear operator and is 
undetermined to the extent of an arbitrary’ imaginary number. Corresponding to 
(66) of §25, the change in any dynamical variable v caused by a rotation through 
a small angle 6¢ about the z-axis is dd(r,v — Urz), (31) 
to the first order in 6¢. Now the changes produced in the three components 
A,, Ay, Az of a vector by a (right-handed) rotation 6¢ about the z-axis applied 
to all measuring apparatus are 0@A,, —d@A, and 0 respectively, and any scalar 
quantity is unchanged by the rotation. Equating these changes to (31), we find that 
TA, = Avy = Ay: ApH Aye Ay 
Peas ee Ags = 0, 

and r, commutes with any scalar. Comparing these results with (30), we see that 
ihr, satisfies the same commutation relations as M/,. Their difference, M, — ihr., 
commutes with all the dynamical variables and must therefore be a number. 
This number, which is necessarily real since M, and thr, are real, may be made 
zero by a suitable choice of the arbitrary! imaginary number that can be added 
to r,. We then have the result Mora ies (32) 
Similar equations hold for M, and M,. They are the analogues of (69) of §25. 
Thus the total angular momentum is connected with the rotation operators as 
the total momentum is connected with the displacement operators. ‘This conclusion 
is valid for any point as origin. 

The above argument applies to the angular momentum arising from the motion 
of particles, defined by (22) for each particle. There is another kind of angular 
momentum occurring in atomic theory, spin angular momentum. The former kind 
of angular momentum will be called orbital angular momentum, to distinguish it. 
The spin angular momentum of a particle should be pictured as due to some 
internal motion of the particle, so that it is associated with different degrees of 
freedom from those describing the motion of the particle as a whole, and hence 
the dynamical variables that describe the spin must commute with 2, y, z, 
Px; Py and pz. The spin does not correspond very closely to anything in classical 
mechanics, so the method of classical analogy is not suitable for studying it. 
However, we can build up theory of the spin simply from the assumption that 
the components of the spin angular momentum are connected with the rotation 
operators in the same way as we had above for orbital angular momentum, 
ie. equation (32) holds with M, as the z component of the spin angular momentum 
of a particle and r, as the rotation operator about the z-axis referring to states of 
spin of that particle. With this assumption, the commutation relations connecting 
the components of the spin angular momentum M with any vector A referring 
to the spin must be of the standard form (30), and hence, taking A to be 


‘pure’ omitted. 
S[‘additive pure’ omitted.] 
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the spin angular momentum itself, we have equation (29) holding also for the spin. 
We now have (29) holding quite generally, for any sum of spin and orbital angular 
momenta, and also (30) will hold generally, for M the total spin and orbital angular 
momentum and A any vector dynamical variable, and the connexion between 
angular momentum and rotation operators will be always valid. 

As an immediate consequence of this connexion, we can deduce the law of 
conservation of angular momentum. For an isolated system, the Hamiltonian 
must be unchanged by any rotation about the origin, in other words it must 
be a scalar, so it must commute with the angular momentum about the origin. 
Thus the angular momentum is a constant of the motion. For this argument 
the origin may be any point. 

As a second immediate consequence, we can deduce that a state with zero 
total angular momentum is spherically symmetrical. The state will correspond to 
a ket |S), say, satisfying M,|S) = M,|S) = M,|S) =0, 
and hence (2\9) =ty ory |S P= & 

This shows that the ket |S) is unaltered by infinitesimal rotations, and it must 
therefore be unaltered by finite rotations, since the latter can be built up from 
infinitesimal ones. Thus the state is spherically symmetrical. The converse 
theorem, a spherically symmetrical state has zero total angular momentum, 
is also true, though its proof is not quite so simple. A spherically symmetrical 
state corresponds to a ket |S) whose direction is unaltered by any rotation. 
Thus the change in |S) produced by a rotation operator r,, ry or r, must be 
a numerical multiple of I, say 
re|S}=c, 1S, ry|S)=ey15), |S) =e.|5), 

where the c’s are numbers. This gives 

M,\S)-= the; |S), M, |S) = ihc, |S) , MENS) S4nc |S) (33) 
These equations are not consistent with the commutation relations (29) for M,, 
M, and M, unless c, = cy = c, = 0, in which case the state has zero total 
angular momentum. We have in (33) an example of a ket which is simultaneously 
an eigenket of three non-commuting linear operators M,, M, and M, and this is 
possible only if all three eigenvalues are zero.! 


36. Properties of angular momentum 

There are some general properties of angular momentum, deducible simply from 
the commutation relations between the three components. These properties must 
hold equally for spin and orbital angular momentum. Let m,, m,, mz be the three 
components of an angular momentum, and introduce the quantity 6 defined by 

B=m2+m+m?. 

Since § is a scalar it must commute with mz, m, and m,. Let us suppose 
we have a dynamical system for which m,;, m, & m, are the only dynamical 


t[This paragraph is substatially redrafted in the fourth edition.] 
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variables. Then $8 commutes with everything and must be a number. We can 
study this dynamical system on much the same lines as we used for the harmonic 
oscillator in §34. 
Put Mz —1My = 7. 
From the commutation relations (27) we get 
mn = (mz +imy)(mz — imy) =m? + in, —i(MzMy — MyMz) 


= 6 —m2+hm, (34) 
and similarly nh = 8 —m? — fim. (35) 
Thus 7 — nh = 2hm,. (36) 
Also mn — nm, = ihm, — hm, = —hn. (37) 


We assume that the components of an angular momentum are observables and 
thus mz has eigenvalues. Let m/, be one of them, and |m’,) an eigenket belonging 
to it. From (34) 

(m!,| 77 |m,) = (m,| 8 — mz + Rm, |m!,) = (6 — mL? + hm!) (m’, | m_,). 
The left-hand side here is the square of the length of the ket 7|m/) and is thus 
greater than or equal to zero, the case of equality occurring if and only if 7 |m/,) = 0. 


Hence B—m? +hm!, > 0, 
or B+ 4h? > (m, — 3h)’. (38) 
Thus B+hh >0. 


i 
2 


Defining the number k by k+4h =(8 +4h7)? = (m2 + mo +m2+4h?)?, (39) 
so that k > —$h, the inequality (38) becomes 
k + $h > |m’, — 5h 
or k+h>m,>-—k. (40) 
An equality occurs if and only if 7 |m‘,) = 0. Similarly from (35) 
(mm! nij|m,) = (8 — mL* — im’) (rm, | m2), 
showing that B—-m? —fm! >0 


or k>m,>—k-—h, 
with an equality occurring if and only if 7|m‘,) = 0. This result combined with 
(40) shows that k > 0 and k >m,>—k, (41) 


with m!, =k if 7|m),) = 0 and m,, = —& if n|m{) = 0. 

From (37) mzn|m) = (nm, — hn) |m.) = (mz, — h)n |m?). 
Now if m!, 4 —k,n|m‘) is not zero and is then an eigenket of m, belonging to 
the eigenvalue m!,—h. Similarly, if m,—h #4 —k, m‘,—2h is another eigenvalue of m,, 
and so on. We get in this way a sequence of eigenvalues m‘,,m/, — h,m!, — 2h,..., 
which must terminate from (41), and can terminate only with the value —k. 
Again, from the conjugate complex of equation (37) 

m.H |) = (ym, + hi) |m_) = (m, + h)n|m:), 
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showing that m+ is another eigenvalue of m, unless 7|m’,) =0, in which 
case m,=k. Continuing in this way we get a sequence of eigenvalues 
mi, m, +h, m+ 2h,..., which must terminate from (41), and can terminate 
only with the value k. We can conclude that 2k is an integral multiple of h and 
that the eigenvalues of m, are k, k—h, k— 2h, ..., -—k+h, —k. (42) 
The eigenvalues of m, and m, are the same, from symmetry. These eigenvalues are 
all integral or half odd integral multiples of h, according to whether 2k is an even 
or odd multiple of h. 
Let |max) be an eigenket of m, belonging to the maximum eigenvalue k, so that 
7j|max) = 0, (43) 
and form the sequence of kets 
max), 7|max), 7°|max), ...,  77*/"|max). (44) 
These kets are all eigenkets of m,, belonging to the sequence of eigenvalues (42) 
respectively. The set of kets (44) is such that the operator 7 applied to any one of 
them gives a ket dependent on the set (7 applied to the last gives zero), and from 
(36) and (43) one sees that 77 applied to any one of the set also gives a ket dependent 
on the set. All the dynamical variables for the system we are now dealing with 
are expressible in terms of 7 and 7, so the set of kets (44) is a complete set. 
There is just one of these kets for each eigenvalue (42) of m,, so m, by itself forms 
a complete commuting set of observables. 

It is convenient to define the magnitude of the angular momentum vector m 

to be k, given by (39), rather than 63, because the possible values for k are 

0, $f, fy 2h, Qh. (45) 
extending to infinity, while the possible values for 6? are a more complicated set 
of numbers. 

For a dynamical system involving other dynamical variables besides m,, m, 
and m,, there may be variables that do not commute with 2. Then ( is no longer 
a number, but a general linear operator. This happens for any orbital angular 
momentum (22), as x, y, Z, Px, Py and p, do not commute with 3. We shall assume 
that 6 is always an observable, and & can then be defined by (39) with the positive 
square root function and is also an observable. We shall call k so defined 
the magnitude of the angular momentum vector m in the general case. The above 
analysis by which we obtained the eigenvalues of m, is still valid if we replace 
|m!,) by a simultaneous eigenket |k’m’/,) of the commuting observables k and mz, 
and leads to the result that the possible eigenvalues for k are the numbers (45), 
and for each eigenvalue k’ of k the eigenvalues of m, are the numbers (42) with k’ 
substituted for k. We have here an example of a phenomenon which we have not 
met with previously, namely that with two commuting observables, the eigenvalues 
of one depend on what eigenvalue we assign to the other. This phenomenon may be 
understood as the two observables being not altogether independent, but partially 
functions of one another.The number of independent simultaneous eigenkets of 


magnitude 
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momentum 
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k and m, belonging to the eigenvalues k’ and m!, must be independent of m‘, 
since for each independent |k’m‘,) we can obtain an independent |k’m‘!), for any 
m!, in the sequence (42), by multiplying |k’m‘) by a suitable power of 77 or 7. 

As an example let us consider a dynamical system with two angular momenta 
m, and my, which commute with one another. If there are no other dynamical 
variables, then all the dynamical variables commute with the magnitudes k, 
and ky of m, and mg, so k, and ky are numbers. However, the magnitude 
of the resultant angular momentum M = mj, + mp is not a number (it does 
not commute with the components of m; and mg) and it is interesting to work 
out the eigenvalues of K. This can be done most simply by a method of 
counting independent kets. There is one independent simultaneous eigenket of 
M,z, and m2, belonging to any eigenvalue m‘,, having one of the values k,, k, —h, 


k, — 2h,..., —k, and any eigenvalue m}, having one of the values ko, ko — h, 
ky — 2h,..., —ko, and this ket is an eigenket of M, belonging to the eigenvalue 
M! = m\,+m‘,. The possible values of M! are thus ky + ko, ki + ko — fh, 
ky + ko —2h,..., —k, — ko, and the number of times each of them occurs is given 
by the following scheme (if we assume for definiteness that k, > ko), 
ky +k, kytko—-h, ky tko—-2h,..., ky — ko, ky —ky—h,..., 
1 y) ike Qk +1 Diky Ld. (46) 
.., ky +ke, —kjtke-h,..., —k, — ko. 
2ko +1 2ko... 1 

Now each eigenvalue K’ of K will be associated with the eigenvalues Kk’, K' — h, 
K' — 2h,..., —K' for M,, with the same number of independent simultaneous 


eigenkets of K and M, for each of them. The total number of independent 
eigenkets of /, belonging to any eigenvalue M! must be the same, whether we take 
them to be simultaneous eigenkets of m,, and mg, or simultaneous eigenkets of 
and M,, i.e. it is always given by the scheme (46). It follows that the eigenvalues 
for Kk are* ky + ko, ky + ko — fh, ky + ko — 2h, woke 3g —k, — ke (47) 
and that for each of these eigenvalues for kK and an eigenvalue for M, going with 
it there is just one independent simultaneous eigenket of K and M,. 

The effect of rotations on eigenkets of angular momentum variables should 
be noted. Take any eigenket |Z!) of the z component of total angular momentum 
for any dynamical system, and apply to it a small rotation through an angle 6@ 
about the z-axis. It will change into (1+d6¢r,)|M!) = (1—16¢ M,/h) |M2) 
with the help of (32). This equals (1 — 16 M1 /h)|M!) = e~ 6 Mz/ | 4") 
to the first order in dd. Thus |!) gets multiplied by the numerical factor e~®? “@:/" 
By applying a succession of these small rotations, we find that the application of 
a finite rotation through an angle ¢ about the z-axis causes |W!) to get multiplied 
by e~#@2/h Putting @ = 27, we find that an application of one revolution about 


*lA minus sign preceeds the last kj.| 
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the z-axis leaves |!) unchanged if the eigenvalue M! is an integral multiple 
of h and causes |M!) to change sign if M! is half an odd integral multiple of h. 
Now consider an eigenket |’) of the magnitude K of the total angular momentum. 
If the eigenvalue K’ is an integral multiple of h, the possible eigenvalues of M, are 
all integral multiples of A and the application of one revolution about the z-axis 
must leave |K’) unchanged. Conversely, if K’ is half an odd integral multiple 
of h, the possible eigenvalues of M, are all half odd integral multiples of A and 
the revolution must change the sign of |K’). From symmetry, the application 
of a revolution about any other axis must have the same effect on |’) as one 
about the z-axis. We thus get the general result, the application of one revolution 
about any axis leaves a ket unchanged or changes its sign according to whether 
it belongs to eigenvalues of the magnitude of the total angular momentum which 
are integral or half odd integral multiples of h. A state, of course, is always 
unaffected by the revolution, since a state is unaffected by a change of sign of 
the ket corresponding to it. 

For a dynamical system involving only orbital angular momenta, a ket 
must be unchanged by a revolution about an axis, since we can set up 
Schrédinger’s representation, with the coordinates of all the particles diagonal, 
and the Schrédinger representative of a ket will get brought back to its original 
value by the revolution. It follows that the eigenvalues of the magnitude of 
an orbital angular momentum are always integral multiples of h. The eigenvalues 
of a component of an orbital angular momentum are also always integral multiples 
of h. For a spin angular momentum, Schr6ddinger’s representation does not exist 
and both kinds of eigenvalue are possible. 


37. The spin of the electron 

Electrons and also some of the other fundamental particles (protons, neutrons) 
have a spin whose magnitude is 4h. This is found from experimental evidence, 
and also there are theoretical reasons showing that this spin value is more 
elementary than any other, even spin zero (see Chapter XI). The study of this 
particular spin is therefore of special importance. 

For dealing with an angular momentum m whose magnitude is 4h, it is 
convenient to put m = sho. (48) 
The components of the vector o then satisfy, from (27), 

Oy Oz — Oz Oy = 2dz, 

Oz Oy — On Oz = 2idy, (49) 

Ox Oy — Oy Ox = 210;. 
The eigenvalues of mz are 4h and —4h, so the eigenvalues of o, are 1 and —1, 
and o? has just the one eigenvalue 1. It follows that o? must equal 1, and similarly 
for o7 and o;, ie. 00, So a (50) 


spin of an electron 


anticommute 
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We can get equations (49) and (50) into a simpler form by means of some 
straightforward non-commutative algebra. From (50) 
0; 0: — ae, =0 


or GO Oe= 620,) + (jos = 26), =0 
or Oy Oz + Oz Oy = 0 
with the help of the first of equations (49). This means o; 0, = —oy Or. 


Two dynamical variables or linear operators like these which satisfy 
the commutative law of multiplication except for a minus sign will be said 
to anticommute. Thus o, anticommutes with o,. From symmetry each of the three 
dynamical variables o,, o,, 0, must anticommute with any other. Equations (49) 


may now be written Oy Oz = 10, = —O;z Oy, 
Oz On = Wy = —Oz Oz, (51) 
Ox Oy = 10, = —Oy Oz, 

and also from (50) G0, Oa (52) 


Equations (50), (51) & (52) are the fundamental equations satisfied by the spin 
variables o describing a spin whose magnitude is 4h. 

Let us set up a matrix representation for the o’s and let us take a, to be 
diagonal. If there are no other independent dynamical variables besides the m’s or 
o’s in our dynamical system, then a, by itself forms a complete set of commuting 
observables, since the form of equations (50) and (51) is such that we cannot 
construct out of o;, o, and o, any new dynamical variable that commutes with o,. 
The diagonal elements of the matrix representing o, being the eigenvalues 1 and 
—1 of o,, the matrix itself will be ( 0 ) 


0 -l 
Let o, be represented by Gi 
i) 
This matrix must be Hermitian, so that a; and a, must be real and ag and az 
conjugate complex numbers. The equation 7, 0, = —d, 0, gives us 


ay a2 = a, —a 
@ oe ad G ) 
so that a; = a4 = 0. Hence o; is represented by a matrix of the form 
0 ag 
(co) 
The equation of o2 = 1 now shows that aga3 = 1. Thus a2 and a3, being 


conjugate complex numbers, must be of the form e’* and e~*® respectively, where 
qa is areal number, so that o, is represented by a matrix of the form 


Similarly it may be shown that o, is also represented by a matrix of this form. 
By suitably choosing the phase factors in the representation, which is not 
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completely determined by the condition that o, shall be diagonal, we can arrange 
that o, shall be represented by the matrix 
0 1 
(i 0} 
The representative of o, is then determined by the equation o, = ia, 0,. We thus 
obtain finally the three matrices 


0 1 0 -i 1 O 

(io) Go) (=) S 

to represent o,, 0, and o, respectively, which matrices satisfy all the algebraic 
relations (49), (50), (51) & (52). The component of the vector o in an arbitrary 
direction specified by the direction cosines 1, m, n, namely lo, + mo, + noz, 
is represented by ( n l—im (54) 

+m —n 

The representative of a ket vector will consist of just two numbers, 
corresponding to the two values +1 and —1 for of. These two numbers form 
a function of the variable o/ whose domain consists of only the two points 
+1 and —1. The state for which o, has the value unity will be represented by 
the function, f.(o/) say, consisting of the pair of numbers 1, 0 and that for which 
o, has the value —1 will be represented by the function, fg(c/) say, consisting of 
the pair 0, 1. Any function of the variable o/, i.e. any pair of numbers, can be 
expressed as a linear combination of these two. Thus any state can be obtained 
by superposition of the two states for which o, equals +1 and —1 respectively. 
For example, the state for which the component of o in the direction 1, m, n, 
represented by (54), has the value +1 is represented by the pair of numbers a, b 


which satisfy n  l—-im\fa\ _ (a 
l+im —n b}) = \b 


or na+(l—im)b=a, 
(1+ im)a—nb=b. 
Thus a lt—-im_ l+n 


b l-n l+im’ 

This state can be regarded as a superposition of the two states for which a, equals 

+1 and —1, the relative weights in the superposition process being as 
lal? : |b]? = |l—im|?: 1 —n)? =14+n:1—n. (55) 
For the complete description of an electron (or other elementary particle with 
spin 3h) we require the spin dynamical variables 0, whose connexion with the spin 
angular momentum is given by (48), together with the Cartesian coordinates 
x,y,z and momenta p,, py, pz. The spin dynamical variables commute with 
these coordinates and momenta. Thus a complete set of commuting observables 
for a system consisting of a single electron will be x, y, z, o,. In a representation 
in which these are diagonal, the representative of any state will be a function of 


central field 
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four variables 2’, y’, z', of. Since o/ has a domain consisting of only two points, 
namely 1 and —1, this function of four variables is the same as two functions of 
three variables, namely the two functions 

(a’y'z! | NG = (x; y’, z; t1| (ey zZ'| = = (x y; zZ, =| Re (56) 
Thus the presence of the spin may be considered either as introducing a new variable 
into the representative of a state or as giving this representative two components. 


38. Motion in a central field of force 
An atom consists of a massive positively charged nucleus together with a number 
of electrons moving round, under the influence of the attractive force of the nucleus 
and their own mutual repulsions. An exact treatment of this dynamical system 
is a very difficult mathematical problem. One can, however, gain some insight 
into the main features of the system by making the rough approximation of 
regarding each electron as moving independently in a certain central field of force, 
namely that of the nucleus, assumed fixed, together with some kind of average of 
the forces due to the other electrons. Thus our present problem of the motion of 
a particle in a central field of force forms a corner-stone in the theory of the atom. 

Let the Cartesian coordinates of the particle, referred to a system of axes with 
the centre of force as origin, be x, y, z and the corresponding components of 
momentum pz, Py, pz. The Hamiltonian, with neglect of relativistic mechanics, 
will be of the form* eee eee 

H= 5 (p+ Py t+ P2) + V, (57) 

where V, the potential energy, is a function only of (x? + y? + 27). To develop 
the theory it is convenient to introduce polar dynamical variables. We introduce 
first the radius r, defined as the positive square root 


re(e+y + z7\h 
Its eigenvalues go from 0 to oo. If we evaluate its P.B.s with pz, py and pz, 
we obtain, with the help of formula (32) of §22, 


OF y z 

[7, Dx a ae = 7 [r; Dy| — - [r, Dz a ie 
the same as in the classical theory. We introduce also the dynamical variable p,. 
defined by Pr =1 "(ape + ypy + 2pz). (58) 


Its P.B. with r is given by 


“(Original drafts have a point ‘.’ to indicate the multiplication of two factors that are 
separated by the point. This point has been left out and the factors reformatted to allow a more 
elegant expression. Each case of the usage is considered separately.| 
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r(r, Pr| = [r; TD, = [7, LDPy + YPy + Zp,| 
= xr, Px] + y[r, Py] + 2[7, pe] 


£ y z 
=2“4-+y-4+2-=F7r. 
ie ig ig 
Hence ep = 
or 1Pp — Ppr = th. 


The commutation relation between r and p, is just the one for a canonical 
coordinate and momentum, namely equation (10) of §22. This makes p, like 
the momentum conjugate to the r coordinate, but it is not exactly equal to 
this momentum because it is not real, its conjugate complex being 
Dr = (Pot + pyy + p2z)r* = (apr + YPy + 2pz — 3ih)r™ 

= (rp, — 3ih)r—! = p, — 2ihr~* (59) 
Thus p,—ihr~' is real and is the true momentum conjugate, the radial momentum,? 
to r. 

The angular momentum m of the particle about the origin is given by (22) and 
its magnitude k is given by (39). Since r and p, are scalars, they commute with 
m, and therefore also with k. 


We can express the Hamiltonian in terms of r, p, and k. We have, if S- denotes 


a sum over cyclic permutations of the suffixes x, y, z, yz 
k(k + fi) = Som? = S" (ep, — yp)" 
LYzZ LYyz 


= S_(xp,xpy + YPeYPx — LPyYPx — YPxEPy) 


xyz 


=) (2°p, + yp; — epePyy — YPyPet + 2p, — TPypyt — Zihapz) 


ryz 
= (a? + y? + 2°)(p2 + py + 2) — (ape + ypy + 2pz) (Det + pyy + pz + 2h) 
= 1"(p2 + py, + v2) — rpr(Der + 26h) 


=r? (pe + pe + p?) — rper, 
from (59). Hence Hes oer iNGER) LV. (60) 
2m \r (ee 


This form for H is such that k commutes not only with H, as is necessary since 
k is a constant of the motion, but also with every dynamical variable occurring 
in H, namely r, p, and V, which is a function of r. In consequence, a simple 
treatment becomes possible, namely, we may consider an eigenstate of k belonging 
to an eigenvalue k’ and then we can substitute k’ for k in (60) and get a problem 
in one degree of freedom r. 

Let us introduce Schrédinger’s representation with x, y, z diagonal. 
Then pz, py, p. are equal to the operators —ihdO/Ox, —ihO/Oy, —ihO/dz 


‘[‘radial momentum’ is included to match the index entry.] 
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respectively. A state is represented by a wave function w(z,y,z,t) satisfying 
Schrédinger’s wave equation (7) of §27, which now reads, with H given by (57), 


vf BP B® BY. 
ange = {-= (— T Ay? T =| T vbe. (61) 


We may pass from the Cartesian coordinates x, y, z to the polar coordinates r, 0, ¢ 
by means of the equations 

r=rsin@cos¢, y =rsin@sin ¢, z=rcos8, (62) 
and may express the wave function in terms of the polar coordinates, so that. it 
reads w(r,6,¢,t). The equations (62) give the operator equation 

OF. OBO OG 0! O20 EO. OO 

Or Orde Ordy Ordz rOx rdy raz’ 
which shows, on being compared with (58), that p, = -—ihd/Or. 
Thus Schrédinger’s wave equation reads, with the form (60) for H, 


Op he 1 & _k(k +A) ; 
a OE” ‘i (-7 ar" R22 ) v} oe 


Here k is a certain linear operator which, since it commutes with r and 0/Or, 
can involve only 6, ¢, 0/00 and 0/0¢. From the formula 


k(k +h) = m2 + mi + mi, (64) 
which comes from (39), and from (62) one can work out the form of k(k + h) and 
one finds k(k +h) 10.90 1 @ 

fe sindd0. 00 sin200¢?" oe) 


This operator is well known in mathematical physics. Its eigenfunctions are 
called spherical harmonics and its eigenvalues are n(n + 1) where n is an integer. 
Thus the theory of spherical harmonics provides an alternative proof that 
the eigenvalues of k are integral multiples of h. 

For an eigenstate of k belonging to the eigenvalue nh (n a non-negative integer) 
the wave function will be of the form Ww =r'y(r, t)S,(0,¢), (66) 
where S,,(0, d) satisfies k(k + h)S,(0, 6) = n(n + 1)h?S,(0, ¢), (67) 
i.e. from (65) S,, is a spherical harmonic of order n. The factor r~' is inserted in 
(66) for convenience. Substituting (66) into (63), we get as the equation for y 


9x f[Pf ninth) 
ih at = on (- Are T a ) t vI XxX: (68) 
If the state is a stationary state belonging to the energy value H’, x will be of 
the form x(r, t) = xo(r)e tt? 
and (68) will ee en oe : - 
XO 2m Oar? | r2 X0- 


This equation may be used to determine the energy-levels H’ of the system. 
For each solution yo of (69), arising from a given n, there will be 2n 4+ 1 
independent states, because there are 2n + 1 independent solutions of (67) 
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corresponding to the 2n + 1 different values that a component of the angular 
momentum, m, say, can take on. 

The probability of the particle being in an element of volume dadydz 
is proportional to |w|"drdydz. With w of the form (66) this becomes 
r\y|?|Si,\>dadydz. The probability of the particle being in a spherical shell 
between r and r+ dr is then proportional to |y|?dr. It now becomes clear that, 
in solving equation (68) or (69), we must impose a boundary condition on boundary 
the function y at r = 0, namely the function must be such that the integral to condition 
the origin f, ly|’ dr is convergent. If this integral were not convergent, the wave 
function would represent a state for which the chances are infinitely in favour of 
the particle being at the origin and such a state would not be physically admissible. 

The boundary condition at r = 0 obtained by the above consideration of 
probabilities is, however, not sufficiently stringent. We get a more stringent 
condition by verifying that the wave function obtained by solving the wave 
equation in polar coordinates (63) really satisfies the wave equation in Cartesian 
coordinates (61). Let us take the case of V = 0, giving us the problem of the free 
particle. Applied to a stationary state with energy H’ = 0, equation (61) gives 

Vb = 0, (70) 

where V? is written for the Laplacian operator 0?/0x? + 67/Oy? + 0?/d2z?, 
and equation (63) gives (: 62 k(k + ") en, 


(71) 


ror. here 
A solution of (71) for k = 0 is  =r7" This does not satisfy (70), since, although 
V’r~' vanishes for any finite value of r, its integral through a volume containing 
the origin is —4a (as may be verified by transforming this volume integral to 
a surface integral by means of Gauss’s theorem), and hence 
V?r_! = —416(x)5(y)5(z). (72) 

Thus not every solution of (71) gives a solution of (70), and more generally, 
not every solution of (63) is a solution of (61). We must impose on the solution 
of (63) the condition that it shall not tend to infinity as rapidly as r~' when 
r — 0 in order that, when substituted into (61), it shall not give a 6 function on 
the right like the right-hand side of (72). Only when equation (63) is supplemented 
with this condition does it become equivalent to equation (61). We thus have 
the boundary condition rw — 0 or y > 0 as r > 0. 

There are also boundary conditions for the wave function as r — oo. If we are 
interested only in ‘closed’ states, i.e. states for which the particle does not go off closed state 
to infinity, we must restrict the integral to infinity [~ |x(rr)|? dr to be convergent. 
These closed states, however, are not the only ones that are physically permissible, 
as we can also have states in which the particle arrives from infinity, is scattered 
by the central field of force, and goes off to infinity again. For these states 
the wave function may remain finite as r — oo. Such states will be dealt with 
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in Chapter VIII under the heading of collision problems. In any case the wave 
function must not tend to infinity as r + oo, or it will represent a state that has 
no physical meaning. 


39. Energy-levels of the hydrogen atom 
The above analysis may be applied to the problem of the hydrogen atom with 
neglect of relativistic mechanics and the spin of the electron. The potential 
energy V is now* —e?/r, so that equation (69) becomes 
d= n(n+1). 2me? 1 2mH' 

iz pe | ep [Xo Re XO e) 
A thorough investigation of this equation has been given by Erwin Schrédinger! 
We shall here obtain its eigenvalues H’ by an elementary argument. 


It is convenient to put Xo = f(r)e*/ (74) 
introducing the new function f(r), where a is one or other of the square roots 
a=+,/(-h?/2mH’). (75) 


Equation (73) now becomes 
da 2d n(n+i1 2me? 1 
oe 0 
dr adr ie ie & 
We look for a solution of this equation in the form of a power series 


ie) = S- om ae (77) 


in which consecutive values for s differ by unity although these values themselves 
need not be integers. On substituting (77) in (76) we obtain 


S- c.{s(s — 1)r*-? — (28/a)r*—! — n(n + 1)r*-? + (Qme?/h*)r*—"} = 0, 


(76) 


which gives, on equating to zero the coefficient of r*~?, the following relation 
between successive coefficients c., 

c.[s(s — 1) — n(n + 1)] = c_1[2(s — 1)/a — 2me? /hr’]. (78) 
We saw in the preceding section that only those eigenfunctions x are allowed 
that tend to zero with r and hence, from (74), f(r) must tend to zero with r. 
The series (77) must therefore terminate on the side of small s and the minimum 
value of s must be greater than zero. Now the only possible minimum values 
of s are those that make the coefficient of c, in (78) vanish, i.e. n + 1 and —n, 
and the second of these is negative or zero. Thus the minimum value of s must be 
n+ 1. Since n is always an integer, the values of s will all be integers. The series 


*The e here, denoting minus the charge on an electron, is, of course, to be distinguished from 
the e denoting the base of exponentials. 

tSchrédinger, E. (1926). ,Quantisierung als Eigenwertproblem“ Annalen Der Physik, 384(4), 
pp. 361-376. [doi: 10.1002/andp.19263840404 | 
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(77) will in general extend to infinity on the side of large s. For large values of s 
the ratio of successive terms is Goris 2r 


according to (78). Thus the series (77) will always converge, as the ratios of 
the higher terms to one another are the same as for the series 
1 /2r\¥ 
S- a (7), (79) 
which converges to e?"/% . 

We must now examine how our solution yo behaves for large values of r. 
We must distinguish between the two cases of H’ positive and H' negative. For H’ 
negative, a given by (75) will be real. Suppose we take the positive value for a. 
Then as r — oo the sum of the series (77) will tend to infinity according to the same 
law as the sum of the series (79), i.e. the law e?"/. Thus, from (74), Xo will tend to 
infinity according to the law e"/* and will not represent a physically possible state. 
There is therefore in general no permissible solution of (73) for negative values 
of H’. An exception arises, however, whenever the series (77) terminates on the side 
of large s, in which case the boundary conditions are all satisfied. The condition 
for this termination of the series is that the coefficient of c,_; in (78) shall vanish 
for some value of the suffix s — 1 not less than its minimum value n + 1, which is 
the same as the condition that $$ _ me” _9 

a fh 
for some integer s not less than n+1. With the help of (75) this condition becomes 


’ mes 


Ph 5 ,ap8 (80) 
and is thus a condition for the energy-level H’. Since s may be any positive 
integer, the formula (80) gives a discrete set of negative energy-levels for 
the hydrogen atom. These are in agreement with experiment. For each of them 
(except the lowest one s = 1) there are several independent states, as there are 
various possible values for n, namely any’ positive integer or zero less than s. 
This multiplicity of states belonging to an energy-level is in addition to that 
mentioned in the preceding section arising from the various possible values for 
a component of angular momentum, which latter multiplicity occurs with any 
central field of force. The n multiplicity occurs only with an inverse square law of 
force and even then is removed when one takes relativistic mechanics into account, 
as will be found in Chapter XI. The solution yo of (73) when H’ satisfies (80) tends 
to zero exponentially as r — oo and thus represents a closed state (corresponding 
to an elliptic orbit in Bohr’s theory). 

For any positive values of H’, a given by (75) will bef imaginary. The series (77), 
which is like the series (79) for large r, will now have a sum that remains finite as 


S[Original:- ‘positive or zero integer’ 
{|‘pure’ omitted] 


selection rule 
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r — co. Thus xo given by (74) will now remain finite as r — oo and will therefore 
be a permissible solution of (73), giving a wave function w~ that tends to zero 
according to the law r~! as r > oo. Hence in addition to the discrete set of negative 
energy-levels (80), all positive energy-levels are allowed. The states of positive 
energy are not closed, since for them the integral to infinity [™ |yo|? dr does 
not converge. (These states correspond to the hyperbolic orbits of Bohr’s theory.) 


AO. Selection rules 

If a dynamical system is set up in a certain stationary state, it will remain 
in that stationary state so long as it is not acted upon by outside forces. 
Any atomic system in practice, however, frequently gets acted upon by external 
electromagnetic fields, under whose influence it is liable to cease to be in one 
stationary state and to make a transition to another. The theory of such transitions 
will be developed in §§44 and 45. A result of this theory is that, to a high 
degree of accuracy, transitions between two states cannot occur under the influence 
of electromagnetic radiation if, in a Heisenberg representation with these two 
stationary states as two of the basic states, the matrix element, referring to 
these two states, of the representative of the total electric displacement D of 
the system vanishes. Now it happens for many atomic systems that the great 
majority of the matrix elements of D in a Heisenberg representation do vanish, 
and hence there are severe limitations on the possibilities for transitions. The rules 
that express these limitations are called selection rules. 

The idea of selection rules can be refined by a more detailed application of 
the theory of 8844 and 45, according to which the matrix elements of the different 
Cartesian components of the vector D are associated with different states of 
polarization of the electromagnetic radiation. The nature of this association is 
just what one would get if one considered the matrix elements, or rather their 
real parts, as the amplitudes of harmonic oscillators which interact with the field 
of radiation according to classical electrodynamics. 

There is a general method for obtaining all selection rules, as follows. Let us call 
the constants of the motion which are diagonal in the Heisenberg representation a’s 
and let D be one of the Cartesian components of D. We must obtain an algebraic 
equation connecting D and the a’s which does not involve any dynamical variables 
other than D and the a’s and which is linear in D. Such an equation will be of 


the form S- Jel Ge = 0, (81) 


where the f,s and g,s are functions of the a’s only. If this equation is expressed 
in terms of representatives, it gives us 


>_ fr(a’) (a| Da”) g,(a"") = 0, 
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or a’| D |a"") p> f-(a’)g-(a”) = 0, 
which shows that (a’| D ja’) = 0 cinless 


dbo’) g(a”) = 0. (82) 


This last equation, giving the connexion which must exist between a’ and a” in 
order that (a’| Da”) may not vanish, constitutes the selection rule, so far as 
the component D of D is concerned. 

Our work on the harmonic oscillator in §34 provides an example of 
a selection rule. Equation (8) is of the form (81) with 7 for D and H playing 
the part of the a’s, and it shows that the matrix elements (H’|7|H”) of 7 all 
vanish except those for which H” — H' = hw. The conjugate complex of this result 
is that the matrix elements (H’|7|H") of 7 all vanish except those for which 
H" — H' = —hw. Since q is a numerical multiple of 7 — 7, its matrix elements 
(H'|q|H"”) all vanish except those for which H” — H’ = thw. If the harmonic 
oscillator carries an electric charge, its electric displacement D will be proportional 
to g. The selection rule is then that only those transitions can take place in which 
the energy H changes by a single quantum hw. 

We shall now obtain the selection rules for m, and k for an electron moving 
in a central field of force. The components of electric displacement are here 
proportional to the Cartesian coordinates x, y, z. Taking first mz, we have that 
m, commutes with z, or that m,z—zm,=0. 

This is an equation of the required type (81), giving us the selection rule 
m,—m! =0 
for the z-component of the displacement. Again, from equations (23) we have 


[mz, [mz, 2] = [mz,y] Sn 
or mx —2m,x2m, + 2m? — h?x = 0, 
which is also of the type (81) and gives us the selection rule 
m,” — Im’, m! ee h? =0 
or (mi, — mi, — h)(m, — mf +h) =0 


for the x-component of the displacement. The selection rule for the y-component 
is the same. Thus our selection rules for m, are that in transitions associated with 
radiation with a polarization corresponding to an electric dipole in the z-direction, 
m‘, cannot change, while in transitions associated with a polarization corresponding 
to an electric dipole in the x-direction or y-direction, m!, must change by +h. 

We can determine more accurately the state of polarization of the radiation 
associated with a transition in which m/, changes by +h, by considering 
the condition for the non-vanishing of matrix elements of x«+iy and x«—iy. We have 

[mz,2 + ty] = y —ix = —i(a + ty) 
or m,(xz + ty) — (a +iy)(m, + h) = 0, 
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which is again of the type (81). It gives 
/ " 
m,—m,—-h=0 
as the condition that (m/| «+ iy|m) shall not vanish. Similarly, 
/ MN 
ip a0 
is the condition that (m/,| x — iy|m) shall not vanish. Hence 


(m!|x —iy|m!, — h) =0 


or (mi|a jm’, — hy =i (ly ml, — h) = (a+ adel" 
say, a, b and w being real. The conjugate complex of this is 
(m!, — h| x|m!,) = —i(m!, — hl y|m!) = (a — ibje"™ 


Thus the vector 4{(m‘,| D |m/, — h) + (m!, — h| D |m/,)}, which determines the state 
of polarization of the radiation associated with transitions for which m= m/,—h, 
has the following three components 

B{(m’,| x |m, — h) + (m, — hl x |m:,)} 
= ${(a+ ibe + (a — ibe} = acoswt — bsinuwt, 
B{(m| y |m, — h) + (m, — hl y |m,)} (83) 
= hi{—(a+ ib)e™ + (a — ible} = asinwt + bcos wt, 

a{(m:| zm, — h) + (mi, — hl z|m_)} = 0. 
From the form of these components we see that the associated radiation moving 
in the z-direction will be circularly polarized, that moving in any direction 
in the xy-plane will be linearly polarized in this plane, and that moving in 
intermediate directions will be elliptically polarized. The direction of circular 
polarization for radiation moving in the z-direction will depend on whether w 
is positive or negative, and this will depend on which of the two states m/, or 
mi! = m’, — h has the greater energy. 

We shall now determine the selection rule for k. We have 
[i(k + h), 2| ae [mi z| + [mi 2| 


= —YMz — MY + ILMy + MyX 


= (myx — m,y + ihz) 


= 2(myx — ymz) = 2(amy — Mzy). 


Similarly, [K(k + A), 2] = 2(ym, — myz) 
and [K(k +h), y] = 2(m,zz — xm,). 
Hence 


[A(K+A), [k(kK+h), z]] 
= 2[k(k+ h), myx — may + thz] 
= 2m,|[k(k + h), x] -2m,[k(k + h), y] +2ih[k(k + A), 2] 
= 4m, (ym, — Myz)—4m, (M,z — em,)+2{k(k + h)z—zk(k + h)} 


= A(max + myy + m.z)m,—4(m? 4 m, t m?)z+2{k(k + h)z—zk(k + h)}. 
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From (22) Met + Myy+m,z =0 (84) 
and hence = [k(k +A), [k(k +h), z]] = —2{k(k + A)z + zk(k+ Ad}, 
which gives 

k?(k+h)?z—2k(kth)zk(k+h)+zk?(k+h)? 2h? {k(k+h)z+zk(k+h)} =0. (85) 
Similar equations hold for x and y. These equations are of the required type (81), 
and give us the selection rule 
k? (ki +h)? —2k (ki CAR (RY tA) AR? (Ro + AY — 207k (k +A) — 2h" (kh +h) =0, 
which reduces to 

(k’ +k" + 2h)(k’ + k")\(k' — k" + h)(k’ — k” —h) =0. 

A transition can take place between two states k’ and k” only if one of these four 
factors vanishes. 

Now the first of the factors, (k’+k"”+2h), can never vanish, since the eigenvalues 
of k are all positive or zero. The second, (k’ + k”), can vanish only if k’ = 0 and 
k” = 0. But transitions between two states with these values for k cannot occur 
on account of other selection rules, as may be seen from the following argument. 
If two states (labelled respectively with a single prime and a double prime) are 
such that k’ = 0 and k” = 0, then from (41) and the corresponding results 
for m, and m,, mi, = m, = mi, = 0 and m; = mj = mf = 0. The selection 
rule for m, now shows that the matrix elements of x and y referring to the two 
states must vanish, as the value of m, does not change during the transition, 
and the similar selection rule for m,, or m, shows that the matrix element of z also 
vanishes. Thus transitions between the two states cannot occur. Our selection 
rule for k now reduces to 

(k’ —k" +h)(k’ —k” —h) =0, 
showing that k must change by +h. This selection rule may be written 
k? _ Ok' kl ats ki? = h2 An 0, 
and since this is the condition that a matrix element (k’| z|k”) shall not vanish, 
we get the equation k?2 —2hzek tek? —f2 = 0 
or [k, [k, z]] = —2, (86) 
a result which could not easily be obtained in a more direct way. 

As a final example we shall obtain the selection rule for the magnitude Kk 
of the total angular momentum M of a general atomic system. Let x, y, z 
be the coordinates of one of the electrons. We must obtain the condition that 
the (K’, K”) matrix element of x, y or z shall not vanish. This is evidently the same 
as the condition that the (K‘, AK”) matrix element of A, Az or A3 shall not vanish, 
where 1, Ag and A3 are any three independent linear functions of x, y and z with 
numerical coefficients, or more generally with any coefficients that commute with 
K and are thus represented by matrices which are diagonal with respect to kK. Let 
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Ao =M,2 + Myy + Mz, 
Ag a2 = My ane; 
Ay =M,2 = Maz — thy; 
Az =Mzy — Myx — thz. 


We have M,A, + M,d,+ M2, = > (M,M,z — M,M.y — ihM,2) 

LYzZ 

= 5 (M,M, — M,M, — ihM,)z = 0 (87) 

LYZ 
from (29). Thus A,, A, and A, are not linearly independent functions of 
x,y and z. Any two of them, however, together with Ao are three linearly 
independent functions of x, y and z and may be taken as the above )j, Ag, As, 
since the coefficients M,, M,, M, all commute with &. Our problem thus reduces 
to finding the condition that the (K’, Kk”) matrix elements of Ao, Az, Ay and A, 
shall not vanish. The physical meanings of these \’s are that Ao is proportional to 
the component of the vector (x, y, z) in the direction of the vector M, and A,, Ay, Az 
are proportional to the Cartesian components of the component of (2, y, z) 
perpendicular to M. 

Since Xo is a scalar it must commute with K. It follows that only the diagonal 

elements (K’'| Ag |’) of Ag can differ from zero, so the selection rule is that K 
cannot change so far as Ag is concerned. Applying (30) to the vector* (Az, Ay, Az) 
we have [May Ag| = Ay [Mas Ay) = =Agye [Map Ac] = 0. 
These relations between M, and Az, Ay, Az are of exactly the same form as 
the relations (23) & (24) between m, and z, y, z and also (87) is of the same 
form as (84). The dynamical variables \,, ,, 4. thus have the same properties 
relative to the angular momentum M as 2, y, z have relative to m. The deduction 
of the selection rule for k when the electric displacement is proportional to (2, y, z) 
can therefore be taken over and applied to the selection rule for K when the electric 
displacement is proportional to (A;, Ay, Az). We find in this way that, so far as 
Az, Ay, Az are concerned, the selection rule for A is that it must change by +h. 

Collecting results, we have as the selection rule for K that it must change by 
0 or th. We have considered the electric displacement produced by only one of 
the electrons, but the same selection rule must hold for each electron and thus also 
for the total electric displacement. 


41. The Zeeman effect for the hydrogen atom 

We shall now consider the system of a hydrogen atom in a uniform magnetic field. 
The Hamiltonian (57) with V = —e?/r, which describes the hydrogen atom in no 
external field, gets modified by the magnetic field, the modification, according to 


*!The original component list is unbracketed.] 
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classical mechanics, consisting in the replacement of the components of momentum, 
Ds Dy Do, BY De (C/O An Dy + le/c)Ay & by (e/cjAy, where A Ay, A, are 
the components of the vector potential describing the field. For a uniform field of 
magnitude # in the direction of the z-axis we may take A, = —3W%y, Ay =3H72, 
A, =0. The classical Hamiltonian will then be 


1 le 4 le : e? 
H = — 2-H ~- 4 ae. 
z{( Dee v) + (+55 °) vk i? 


This classical Hamiltonian may be taken over into the quantum theory if we add 
on to it a term giving the effect of the spin of the electron. According to 
experimental evidence and according to the theory of Chapter XI, the electron has 
a magnetic moment —(eh/2mc)o where a is the spin vector of §37. The energy of 
this magnetic moment in the magnetic field will be (ehH#/2mc)o,. Thus the total 
quantum Hamiltonian will be 


1 1 ; 1 : PERE. 
= 54 (me 590) + (o+ 5 5202) vith So, (88) 


2m 2¢ 2mc 


There ought strictly to be other terms in this Hamiltonian giving the interaction 
of the magnetic moment of the electron with the electric field of the nucleus of 
the atom, but this effect is small, of the same order of magnitude as the correction 
one gets by taking relativistic mechanics into account, and will be neglected here. 
It will be taken into account in the relativistic theory of the electron given in 
Chapter XI. 

If the magnetic field is not too large, we can neglect terms involving #7, so that 
the Hamiltonian (88) reduces to 


1 ee eH ehH# 
H = 2 2 2) _ ie —_—_¢, 
Ei sce. ae att He? Nee | 
= _ sit ne. 89 
57 Pe + Py + Be) — — + 5 (mz + ho:) (89) 


The extra terms due to the magnetic field are now (e#/2mc)(m, + ho). 
But these extra terms commute with the total Hamiltonian and are thus 
constants of the motion. This makes the problem very easy. The stationary 
states of the system, i.e. the eigenstates of the Hamiltonian (89), will be those 
eigenstates of the Hamiltonian for no field that are simultaneously eigenstates 
of the observables m, and o,, or at least of the one observable mz + hoz, 
and the energy-levels of the system will be those for the system with no field, 
given by (80) if one considers only closed states, increased by an eigenvalue of 
(eH /2mc)(m, + ho,). Thus stationary states of the system with no field for 
which m, has the numerical value m‘,, an integral multiple of h, and for which 
also o, has the numerical value of = +1, will still be stationary states when the 


magnetic moment 
of the electron 


magnetic anomaly 
of the spin 
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field is applied. Their energy will be increased by an amount consisting of the sum 
of two parts, a part (e#/2mc)m’, arising from the orbital motion, which part 
may be considered as due to an orbital magnetic moment —em!,/2mc, and a part? 
(eH /2mc)ho! arising from the spin. The ratio of the orbital magnetic moment to 
the orbital angular momentum m/‘, is —e/2mc, which is half the ratio of the spin 
magnetic moment to the spin angular momentum. This fact is sometimes referred 
to as the magnetic anomaly of the spin. 

Since the energy-levels now involve m,, the selection rule for m, obtained 
in the preceding section becomes capable of direct comparison with experiment. 
We take a Heisenberg representation in which, among other constants of 
the motion, m, and go, are diagonal. The selection rule for m, now requires 
m, to change by h, 0, or —h, while o,, since it commutes with the electric 
displacement, will not change at all. Thus the energy difference between the two 
states taking part in the transition process will differ by an amount eh#/2mc, 0, 
or —eh” /2mc from its value for no magnetic field. Hence, from Bohr’s frequency 
condition, the frequency of the associated electromagnetic radiation will differ 
by e#/4amc, 0, or —e#/4amc from that for no magnetic field. This means 
that each spectral line for no magnetic field gets split up by the field into three 
components. If one considers radiation moving in the z-direction, then from (83) 
the two outer components will be circularly polarized, while the central undisplaced 
one will be of zero intensity. These results are in agreement with experiment and 
also with the classical theory of the Zeeman effect. 


{Original has a h incorrectly.] 
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42. General remarks 

IN the preceding chapter exact treatments were given of some simple dynamical 
systems in the quantum theory. Most quantum problems, however, cannot be 
solved exactly with the present resources of mathematics, as they lead to equations 
whose solutions cannot be expressed in finite terms with the help of the ordinary 
functions of analysis. For such problems one can often use a perturbation method.' 
This consists in splitting up the Hamiltonian into two parts, one of which 
must be simple and the other small. The first part may then be considered 
as the Hamiltonian of a simplified or unperturbed system, which can be dealt 
with exactly, and the addition of the second will then require small corrections, 
of the nature of a perturbation, in the solution for the unperturbed system. 
The requirement that the first part shall be simple requires in practice that it shall 
not involve the time explicitly. If the second part contains a small numerical 
factor €, we can obtain the solution of our equations for the perturbed system in 
the form of a power series in €, which, provided it converges, will give the answer 
to our problem with any desired accuracy. Even when the series does not converge, 
the first approximation obtained by means of it is usually fairly accurate. 

There are two distinct methods in perturbation theory. In one of 
these the perturbation is considered as causing a modification of the states 
of motion of the unperturbed system. In the other we do not consider 
any modification to be made in the states of the unperturbed system, 
but we suppose that the perturbed system, instead of remaining permanently 
in one of these states, is continually changing from one to another, or making 
transitions, under the influence of the perturbation. Which method is to 
be used in any particular case depends on the nature of the problem to be 
solved. The first method is useful usually only when the perturbing energy 
(the correction in the Hamiltonian for the undisturbed system) does not involve 
the time explicitly, and is then applied to the stationary states. It can be 
used for calculating things that do not refer to any definite time, such as the 
energy-levels of the stationary states of the perturbed system, or, in the case of 


[No other methods are considered and the work was published before ready access to 
digital computers.| 
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collision problems, the probability of scattering through a given angle. The second 
method must, on the other hand, be used for solving all problems involving 
a consideration of time, such as those about the transient phenomena that occur 
when the perturbation is suddenly applied, or more generally problems in which 
the perturbation varies with the time in any way (i.e. in which the perturbing 
energy involves the time explicitly). Again, this second method must be used 
in collision problems, even though the perturbing energy does not here involve 
the time explicitly, if one wishes to calculate absorption and emission probabilities, 
since these probabilities, unlike a scattering probability, cannot be defined without 
reference to a state of affairs that varies with the time. 

One can summarize the distinctive features of the two methods by saying that, 
with the first method, one compares the stationary states of the perturbed 
system with those of the unperturbed system; with the second method one takes 
a stationary state of the unperturbed system and sees how it varies with time 
under the influence of the perturbation. 


43. The change in the energy-levels caused by 


a perturbation 
The first of the above-mentioned methods will now be applied to the calculation of 
the changes in the energy-levels of a system caused by a perturbation. We assume 
the perturbing energy, like the Hamiltonian for the unperturbed system, 
not to involve the time explicitly. Our problem has a meaning, of course, 
only provided the energy-levels of the unperturbed system are discrete and 
the differences between them are large compared with the changes in them caused 
by the perturbation. This circumstance results in the treatment of perturbation 
problems by the first method having some different features according to whether 
the energy-levels of the unperturbed system are discrete or continuous. 

Let the Hamiltonian of the perturbed system be H=E+YV, (1) 
E being the Hamiltonian of the unperturbed system and V the small perturbing 
energy. By hypothesis each eigenvalue H’ of H lies very close to one and only 
one eigenvalue E” of E. We shall use the same number of primes to specify any 
eigenvalue of H and the eigenvalue of EF to which it lies very close. Thus we shall 
have H” differing from E” by a small quantity of order V and differing from E’ 
by a quantity that is not small unless £’ = E” We must now take care always 
to use different numbers of primes to specify eigenvalues of H and E which we do 
not want to lie very close together. 

To obtain the eigenvalues of H, we have to solve the equation 

pedis eae: bck) 

or (H' — F)|H’) =V|H’). (2) 
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Let |0) be an eigenket of E belonging to the eigenvalue E’ and suppose the |H’) 
and H’ that satisfy (2) to differ from |0) and E” only by small quantities and to be 


expressed as Fr 10) el se DN ab an | (3) 


A! = E’+a,+a,+---, 
where |1) and a, are of the first order of smallness (i.e. the same order as V), 
|2) and ag are of the second order, and so on. Substituting these expressions 
in (2), we obtain 
{E" — EB +a, +a2+---}{]O) + |1) + |2) +--+} =V{JO) + |1) +--+}. 
If we now separate the terms of zero order, of the first order, of the second order, 
and so on, we get the following set of equations, 
(E’ — F)|0) =0, 
(E" — E)|1) + a1 |0) = V 0), 
(E" — E) |2) + ay |1) + a2 |0) =V 1), 


(4) 


The first of these equations tells us, what we have already assumed, that |0) is 
an eigenket of E belonging to the eigenvalue E’. The others enable us to calculate 
the various corrections |1), |2),... & ay, a,.... 

For the further discussion of these equations it is convenient to introduce 
a representation in which F is diagonal, i.e. a Heisenberg representation for 
the unperturbed system, and to take EF itself as one of the observables whose 
eigenvalues label the representatives. Let the others, in the event of others being 
necessary, as is the case when there is more than one eigenstate of EF belonging to 
any eigenvalue, be called 3’s. A basic bra is then (E”3”|. Since |0) is an eigenket 
of E belonging to the eigenvalue E’, we have (E"B"|0) = deve f(B"), (5) 
where f(3”) is some function of the variables 6” With the help of this result 
the second of equations (4), written in terms of representatives, becomes 


(B! — B") (BBN) + adave f(8") = DO(E" BV IE’) FG). 6) 
p' 
Putting E” = E’ here, we get ay f(8") = > (E'B"|V BB) (8). 
Bt 

Equation (7) is of the form of the standard equation in the theory of eigenvalues, 

so far as the variables $’ are concerned. It shows that the various possible values 
for a, are the eigenvalues of the matrix (E"6"|V |E’6"). This matrix is a part of 
the representative of the perturbing energy in the Heisenberg representation for 
the unperturbed system, namely, the part consisting of those elements that refer 
to the same unperturbed energy-level E’ for their row and column. Each of these 
values for a, gives, to the first order, an energy-level of the perturbed system lying 
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close to the energy-level E’ of the unperturbed system! There may thus be several 
energy-levels of the perturbed system lying close to the one energy-level E’ of 
the unperturbed system, their number being anything not exceeding the number 
of independent states of the unperturbed system belonging to the energy-level E’. 
In this way the perturbation may cause a separation or partial separation of 
the energy-levels that coincide at E’ for the unperturbed system. 

Equation (7) also determines, to the zero order, the representatives (E”3"|0) of 
the stationary states of the perturbed system belonging to energy-levels lying close 
to E’, any solution f(3’) of (7) substituted in (5) giving one such representative. 
Each of these stationary states of the perturbed system approximates to one of 
the stationary states of the unperturbed system, but the converse, that each 
stationary state of the unperturbed system approximates to one of the stationary 
states of the perturbed system, is not true, since the general stationary state 
of the unperturbed system belonging to the energy-level E” is represented by 
the right-hand side of (5) with an arbitrary function f(6”). The problem of finding 
which stationary states of the unperturbed system approximate to stationary 
states of the perturbed system, i.e. the problem of finding the solutions f(/’) 
of (7), corresponds to the problem of ‘secular perturbations’ in classical mechanics. 
It should be noted that the above results are independent of the values of all those 
matrix elements of the perturbing energy which refer to two different energy-levels 
of the unperturbed system. 

Let us see what the above results become in the specially simple case when 
there is only one stationary state of the unperturbed system belonging to each 
energy-level$ In this case E alone fixes the representation, no 8’s being required. 
The sum in (7) now reduces to a single term and we get 

a, = (B"|V |B’). (8) 
There is only one energy-level of the perturbed system lying close to any 
energy-level of the unperturbed system and the change in energy is equal, 
in the first order, to the corresponding diagonal element of the perturbing energy 
in the Heisenberg representation for the unperturbed system, or to the average 
value of the perturbing energy for the corresponding unperturbed state. The latter 
formulation of the result is the same as in classical mechanics when the unperturbed 
system is multiply periodic. 


*To distinguish these energy-levels one from another we should require some more elaborate 
notation, since according to the present notation they must all be specified by the same number of 
primes, namely by the number of primes specifying the energy-level of the unperturbed system 
from which they arise. For our present purposes, however, this more elaborate notation is 
not required. 

8A system with only one stationary state belonging to each energy-level is often called 
non-degenerate and one with two or more stationary states belonging to an energy-level is called 
degenerate, although these words are not very appropriate from the modern point of view. 
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We shall proceed to calculate the second-order correction a2 in the energy-level 
for the case when the unperturbed system is non-degenerate. Equation (5) for this 
case reads (E" | 0) = dpe, 
with neglect of an unimportant numerical factor, and equation (6) reads 

(E’ = E”) i | 1) + a,0p" Ep = (E"| V |B’). 
This gives us the value of (E” | 1) when E” 4 E’, namely 
(E"|V |E") 
(E" |1) = a (9) 
The third of equations (4), written in terms of representatives, becomes 
(E! — BE") (E" | 2) +a, (B" | 1) + adam = >) (B"|V|E”) (B™ | 1). 


El 
Putting EE" = E’ here, we get ay (E’ | 1) +4) = S- (E"| V |B") (E” | i. 
which reduces, with the help of (8), to BN 
ag= S- (E'|V |B") (E" | 1). 


EVUAE' 
Substituting for (E” | 1) from (9), we obtain finally 
(EY V |B") (E"| VE") 
= 2 EB! — BE" 2 


BU AE! 
giving for the total energy change to the second order 
/ i i / 
ata, =(E VIE) + >~ ee ae (10) 
BU AE! 

The method may be developed for the calculation of the higher approximations 
if required. General recurrence formulas giving the nth order corrections in terms 
of those of lower order have been obtained by Max Born, Werner Heisenberg and 
Pascual Jordan.! 


44. The perturbation considered as causing 


transitions 

We shall now consider the second of the two perturbation methods mentioned 
in §42. We suppose again that we have an unperturbed system governed by 
a Hamiltonian EF which does not involve the time explicitly, and a perturbing 
energy V which can now be an arbitrary function of the time. The Hamiltonian 
for the perturbed system is again H = E+ V. For the present method it does not 
make any essential difference whether the energy-levels of the unperturbed system, 


‘Heisenberg, ,,Uber quantentheoretische Umdeutung kinematischer und mechanischer 
Beziehungen“ Zeitschrift fiir Physik, 33, 1925, pp. 879-893. [doi: 10.1007/BF01328377 | 
With: Born and Jordan ,,Zur Quantenmechanik“ Zeitschrift fiir Physik, 34, 1925, 
pp. 858-888. [doi: 10.1007/BF01328531 | With: Born, Heisenberg and Jordan ,,Zur 
Quantenmechanik II“ Zeitschrift fiir Physik, 35, 1926, pp. 557-615. [doi: 10.1007/BF01379806 | 
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i.e. the eigenvalues of EF, form a discrete or continuous set. We shall, 
however, take the discrete case, for definiteness. We shall again work with 
a Heisenberg representation for the unperturbed system, but as there will now 
be no advantage in taking FE itself as one of the observables whose eigenvalues 
label the representatives, we shall suppose we have a general set of a’s to label 
the representatives. 

Let us suppose that at the initial time t) the system is in a state for 
which the a’s certainly have the values a’. The ket corresponding to this state 
is the basic ket |a’). If there were no perturbation, i.e. if the Hamiltonian were EF, 
this state would be stationary. The perturbation causes the state to change. 
At time t the ket corresponding to the state in Schrédinger’s picture will be T |a’), 
according to equation (1) of §27. The probability of the a’s then having the values 
al” is P(al, al") = |(a"|T la’) |" (11) 
For a” 4 a‘, P(a‘,a”) is the probability of a transition taking place from state 
a’ to state a” during the time interval to > t, while P(a‘,a’) is the probability 
of no transition taking place at all. The sum of P(a‘,a”) for all a” is, of course, 
unity. 

Let us now suppose that initially the system, instead of being certainly in 
the state a’, is in one or other of various states a’ with the probability P, for each. 
The Gibbs density corresponding to this distribution is, according to (68) of §33 


p= Sola") Par (a. (12) 
At time t, each ket: |a’) will have changed to T |a’) and each bra (a’| to (a’| T, so p 
will have changed to a= > T \a!) Py (ol T. (13) 


The probability of the a’s then having the values a” will be, from (73) of §33 


"| pela") = Du Ae ile) (a'|T Ja") 
= 7 PPI a’, al”) (14) 


with the help of (11). This result expresses that the probability of the system 
being in the state a” at time ¢ is the sum of the probabilities of the system 
being initially in any state a’ ~ a”, and making a transition from state a’ to 
state a” and the probability of its being initially in the state a” and making 
no transition. Thus the various transition probabilities act independently of one 
another, according to the ordinary laws of probability. 

The whole problem of calculating transitions thus reduces to the determination 
of the probability amplitudes (a”|T Ja’). These can be worked out from 
the differential equation for T, equation (6) of §27, or 


ihdT /dt = HT =(E+V)T. (15) 
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The calculation can be simplified by working with 


pre cE (t-to)/hp (16) 
We have ih dT” /dt = e'@(-t)/4(_ FT + in dT /dt) 

SCOT = VANE (17) 
where Viator e kom (18) 
ie. V* is the result of applying a certain unitary transformation to V. Equation (17) 


is of a more convenient form than (15), because (17) makes the change in T* 
depend entirely on the perturbation V, and for V = 0 it would make 7™* equal its 
initial value, namely unity. We have from (16) 
(al”| TT Ja’) _ eB (t-to)/h ("| T Ja’), 

so that P(al, al") = |(a""| T* |a’) |? (19) 
showing that 7* and T are equally good for determining transition probabilities. 

Our work up to the present has been exact. We now assume V is a small 
quantity of the first order and express 7* in the form 

Te=14+T7+T3+---, (20) 

where Ty is of the first order. T> is of the second, and so on. Substituting (20) 
into (17) and equating terms of equal order, we get 


ih dT*/dt =V* 


wedl, (ab =V" 1s, (ea) 
From the first of these equations we obtain 
t 
(een) i VG de, (22) 
to 
from the second we obtain 
t t! 
= -1? f Vide / Vedi’ (23) 
to to 


and so on. For many practical problems it is sufficiently accurate to retain only 
the term Ty, which gives for the transition probability P(a’,a”) with a” 4 a’ 


t 
P(a’, a") =i? (a’| f V*(t') dt’ |a’) 
to 


2 
2 (24) 


t 

/ (al”"| V*(t') ja’) dt’ |. 
to 

We obtain in this way the transition probability to the second order of accuracy. 


The result depends only on the matrix element (a”| V*(t’) ja’) of V*(t’) referring 

to the two states concerned, with ¢’ going from to to t. Since V* is real, like V, 
(a”| V*(#') a’) = (a’| V*(t") Ja”) 

and hence P(a’,a”) = P(a", a’) (25) 

to the second order of accuracy. 
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Sometimes one is interested in a transition a’ > a” such that the matrix 
element (a”|V* la’) vanishes, or is small compared with other matrix elements 
of V* It is then necessary to work to a higher accuracy. If we retain only the terms 
Ty and T3, we get, for a” 4 a’, 


L 
Pata") =A] f (a"|V*(t) a’) ae 
to 


2 


t t’ 

—ih-! S- for V* (t’) |e”) at’ far V* (t”) |a’) dt” : (26) 
al zal, a!’ to to 

WM 


The terms a” = a’ and a” = a” are omitted from the sum since they are small 
compared with other terms of the sum, on account of the smallness of (a”| V* Ja’). 
To interpret the result (26), we may suppose that the term 

t 


(a"|V*(t') a’) dt’ (27) 


to 
gives rise to a transition directly from state a’ to state a”, while the term 


t i 
—ih} i (a"| V*(t’) |”) dt! / (al | V* (t”) |’) dt” (28) 
to to 


gives rise to a transition from state a’ to state a”, followed by a transition 
from state a” to state a” The state a” is called an intermediate 
state in this interpretation. We must add the term (27) to the various 
terms (28) corresponding to different intermediate states and then take 
the square of the modulus of the sum, which means that there is interference 
between the different transition processes—the direct one and those involving 
intermediate states—and one cannot give a meaning to the probability for one of 
these processes by itself. For each of these processes, however, there is a probability 
amplitude. If one carries out the perturbation method to a higher degree of 
accuracy, one obtains a result which can be interpreted similarly, with the help of 
more complicated transition processes involving a succession of intermediate states. 


45. Application to radiation 

In the preceding section a general theory of the perturbation of an atomic system 
was developed, in which the perturbing energy could vary with the time in 
an arbitrary way. A perturbation of this kind can be realized in practice by 
allowing incident electromagnetic radiation to fall on the system. Let us see what 
our result (24) reduces to in this case. 

If we neglect the effects of the magnetic field of the incident radiation, 
and if we further assume that the wave-lengths of the harmonic components of 
this radiation are all large compared with the dimensions of the atomic system, 
then the perturbing energy is simply the scalar product 

V =(D,@), (29) 
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where D is the total electric displacement of the system and @ is the electric 
force of the incident radiation. We suppose @ to be a given function of the time. 
If we take for simplicity the case when the incident radiation is plane polarized with 
its electric vector in a certain direction and let D denote the Cartesian component 
of D in this direction, the xen) for V reduces to the ordinary product 


where & is the magnitude of the vector & The matrix elements of V are 
(a V Ja’) = (a"| D bas 
since & is a number. The matrix element (a’ ee a’) ) is “inde Rea of t. From (18) 
(a!|V*(4) |a’) = (al"| D la’) e* ee 
and hence the expression (24) for the transition probability me 


P(al, al") = A? | (a"| D la’) |? ee EE ROE) dt’ (30) 
t 
If the incident radiation during the time interval ty to t is resolved into its 


Fourier components, the energy crossing unit area per unit frequency range about 
the frequency v will be, according to classical electrodynamics, 


t 2 
Hy = i emt) BH) at! | (31) 
Th St 
Comparing this with (30), we obtain P(a’, w") = 2nc7!h-?|(a""| Dja’)|’E,, (32) 
where vy =|E" — B'|/h. (33) 


From this result we see in the first place that the transition probability 
depends only on that Fourier component of the incident radiation whose 
frequency v is connected with the change of energy by (33). This gives us Bohr’s 
Frequency Condition and shows how the ideas of Bohr’s atomic theory, which was 
the forerunner of quantum mechanics, can be fitted in with quantum mechanics. 

The present elementary theory does not tell us anything about the energy of 
the field of radiation. It would be reasonable to assume, though, that the energy 
absorbed or liberated by the atomic system in the transition process comes from 
or goes into the component of the radiation with frequency v given by (33). 
This assumption will be justified by the more complete theory of radiation given 
in Chapter X. The result (32) is then to be interpreted as the probability of 
the system, if initially in the state of lower energy, absorbing radiation and being 
carried to the upper state, and if initially in the upper state, being stimulated by 
the incident radiation to emit and fall to the lower state. The present theory does 
not account for the experimental fact that the system, if in the upper state with 
no incident radiation, can emit spontaneously and fall to the lower state, but this 
also will be accounted for by the more complete theory of Chapter X. 

The existence of the phenomenon of stimulated emission was inferred 
by Albert Einstein} long before the discovery of quantum mechanics, 


tEinstein, A. (1917) Physikalische Zeitschrift, 18, pp. 121-128 [Einstein, A.; B. L. van der 
Waerden (translator and editor) ‘On the quantum theory of radiation.’ Sources of Quantum 
Mechanics, North-Holland, Amsterdam, 1968] 
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from a consideration of statistical equilibrium between atoms and a field of 
black-body radiation satisfying Planck’s law. Einstein showed that the transition 
probability for stimulated emission must equal that for absorption between 
the same pair of states, in agreement with the present quantum theory, 
and deduced also a relation connecting this transition probability with that for 
spontaneous emission, which relation is in agreement with the theory of Chapter X. 
The matrix element (a”| D |a’) in (32) plays the part of the amplitude of one of 
the Fourier components of D in the classical theory of a multiply-periodic system 
interacting with radiation. In fact it was the idea of replacing classical Fourier 
components by matrix elements which led Werner Heisenberg to the discovery of 
quantum mechanics in 1925* Heisenberg assumed that the formulae describing 
the interaction with radiation of a system in the quantum theory can be 
obtained from the classical formulae by substituting for the Fourier components of 
the total electric displacement of the system the corresponding matrix elements. 
According to this assumption applied to spontaneous emission, a system having 
an electric moment D will, when in the state a’, spontaneously emit radiation of 
frequency v = (E’ — E”)/h, where En is an energy-level, less than E’, of some 
MN 
state a”, at the rate 4 ery) Mal"| D ja’) 2 (34) 
The distribution of this radiation over the different directions of emission and 
its state of polarization for each direction will be the same as that for a classical 
electric dipole of moment equal to the real part of (a”’| D |a’). To interpret this rate 
of emission of radiant energy as a transition probability, we must divide it by 
the quantum of energy of this frequency, namely hy, and call it the probability per 
unit time of this quantum being spontaneously emitted, with the atomic system 
simultaneously dropping to the state a” of lower energy. These assumptions 
of Heisenberg are justified by the present radiation theory, supplemented by 
the spontaneous transition theory of Chapter X. 


46. Transitions caused by a perturbation 


independent of the time 

The perturbation method of §44 is still valid when the perturbing energy V does 
not involve the time ¢ explicitly. Since the total Hamiltonian H in this case 
does not involve t explicitly, we could now, if desired, deal with the system by 
the perturbation method of §43 and find its stationary states. Whether this method 
would be convenient or not would depend on what we want to find out about 


*“TWerner Heisenberg (1925). ,Uber quantentheoretische Umdeutung kinematischer 
und mechanischer Beziehungen.“ Zeitschrift fiir Physik 33 (1): 879-898. Bibcode: 
1925ZPhy...33..879H. doi: 10.1007/BF01328377 An English translation may be found in 
B. L. van der Waerden, trans., ed. (1968). Sources of Quantum Mechanics. New York: 
Dover. pp. 261-276] 
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the system. If what we have to calculate makes an explicit reference to the time, 
e.g. if we have to calculate the probability of the system being in a certain state at 
one time when we are given that it is in a certain state at another time, the method 
of 844 would be the more convenient one. 

Let us see what the result (24) for the transition probability becomes when 
V does not involve ¢t explicitly and let us take tp) = 0 to simplify the writing. 
The matrix element (a”| V |a’) is now independent of t, and from (18) 

(ai"| V* (t') Ja’) _ (a"| V a’) ee Eee 
t ei(E"-Et/h 4 
NiO) (a"| V*(t’) a’) dt’ = (a"| V a’) 
0 i(E" — B’)/h 
provided E” # E’. Thus the transition probability (24) becomes 
P(a’, a") _ a” V lal "(ene eee _ 1] Cee _ 1]/(B" — E')? 
= 2\(a"| V |a’)|°[1 — cos{(E" — E')t/h}|/(E" — E')? (36) 

If E” differs appreciably from £’ this transition probability is small and remains 
so for all values of t. This result is required by the law of the conservation of energy. 
The total energy H is constant and hence the proper-energy E (i.e. the energy with 
neglect of the part V due to the perturbation), being approximately equal to 4, 
must be approximately constant. This means that if F initially has the numerical 
value E’, at any later time there must be only a small probability of its having 
a numerical value differing considerably from E’. 

On the other hand, when the initial state a’ is such that there exists another 
state a” having the same or very nearly the same proper-energy EF’, the probability 
of a transition to the final state a” may be quite large. The case of physical 
interest now is that in which there is a continuous range of final states a” having 
a continuous range of proper-energy levels E” passing through the value E” of 
the proper-energy of the initial state. The initial state must not be one of 
the continuous range of final states, but may be either a separate discrete state 
or one of another continuous range of states. We shall now have, remembering 
the rules of 818 for the interpretation of probability amplitudes with continuous 
ranges of states, that, with P(a‘,a”) having the value (36), the probability 
of a transition to a final state within the small range a” to a” + da” will 
be P(a’,a")da" if the initial state a’ is discrete and will be proportional to 
this quantity if a’ is one of a continuous range. 

We may suppose that the a’s describing the final state consist of E together 
with a number of other dynamical variables 6, so that we have a representation 
like that of §43 for the degenerate case. (The 6’s, however, need have no meaning 
for the initial state a’.) We shall suppose for definiteness that the 6’s have only 
discrete eigenvalues. The total probability of a transition to a final state a” for 
which the 6’s have the values 8” and E has any value (there will be a strong 
probability of its having a value near the initial value E’) will now be (or be 
proportional to) 


(35) 


proper-energy 


probability 
coefficient 
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JrtW: a") dE" 


= 2 f (E"B"|V |a’)|"[1 — cos{(B" — E)t/h}|/(E" — E'P dE" (37) 


= 2th! / |(E' + ha /t, "| V |a’)|?[1 — cos a] /2? dx 
if one makes the substitution (E£”— E’)t/h = x. For large values of t this reduces to 
2th-!|(E’B"| V a’)? ij [1 — cos a]/a? dx = 2nth|(E'B"|V |a’\/’. (38) 


Thus the total probability up to time t of a transition to a final state for which 
the 6’s have the values 6” is proportional to t. There is therefore a definite 
probability coefficient, or probability per unit time, for the transition process under 
consideration, having the value Qnh*|(E'B"| V lal)? (39) 
It is proportional to the square of the modulus of the matrix element, associated 
with this transition, of the perturbing energy. 

If the matrix element (£’8”|V ja’) is small compared with other matrix 
elements of V, we must work with the more accurate formula (26). We have 


from (35) ; " m fs WN pe (gil ! " 
fev *(t!) Jal”) at [eo LV*(e") la’) dt 


a 
= (a"| V la”) (a al V ja’) fre i(E”—E"") ee i(B!— EB! )t" /h dt” 


_ (a”| Vi ja) (al"| V la’) [re i(B Et /h _ i(B"— Bt ye'/ny dt! 
i(El” — E') \/hi 

For E” close to E£’, only the first term in the integrand here gives rise to 

a transition probability of physical importance and the second term may be 

discarded. Using this result in (26) we get 


(a”| V |a”’’ (a V |a’) 
= 2 (a"|V ja’) = S- EM" — BE 
alZal, all 
which replaces (36). Proceeding as before, we obtain for the transition probability 
per unit time to a final state for which the 6’s have the values 3” and E has a value 
close to its initial value E’ 
20 
nh 


= cos{(E” — E’)t/h} 


P(a’, a") (E" =. FE’)? 


2 
(E’B"| V |a’”’’ (a V |a’) 
Ss 


(E'B"|V |a ee E" — EB 


(40) 


atthe ail 
This formula shows how intermediate states, differing from the initial state and 
final state, play a role in the determination of a probability coefficient. 

In order that the approximations used in deriving (39) and (40) may be valid, 
the time ¢ must be not too small and not too large. It must be large compared 
with the periods of the atomic system in order that the approximate evaluation 
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of the integral (37) leading to the result (38) may be valid, while it must not be 
excessively large or else the general formula (24) or (26) will break down. In fact 
one could make the probability (38) greater than unity by taking ¢ large enough. 
The upper limit to t is fixed by the condition that the probability (24) or (26), or t 
times (39) or (40), must be small compared with unity. There is no difficulty in t 
satisfying both these conditions simultaneously provided the perturbing energy V 
is sufficiently small. 


A7. The anomalous Zeeman effect 

One of the simplest examples of the perturbation method of 843 is the calculation 
of the first-order change in the energy-levels of an atom caused by a uniform 
magnetic field. The problem of a hydrogen atom in a uniform magnetic field 
has already been dealt with in §41 and was so simple that perturbation theory 
was unnecessary. The case of a general atom is not much more complicated when 
we make a few approximations such that we can set up asimple model for the atom. 

We first of all consider the atom in the absence of the magnetic field and 
look for constants of the motion or quantities that are approximately constants of 
the motion. The total angular momentum of the atom, the vector j say, is certainly 
a constant of the motion. This angular momentum may be regarded as the sum 
of two parts, the total orbital angular momentum of all the electrons, 1 say, 
and the total spin angular momentum, s say. Thus we have j = 1+s. Now the effect 
of the spin magnetic moments on the motion of the electrons is small compared 
with the effect of the Coulomb forces and may be neglected as a first approximation. 
With this approximation the spin angular momentum of each electron is a constant 
of the motion, there being no forces tending to change its orientation. Thus s, 
and hence also 1, will be constants of the motion. The magnitudes, |, s and j say, 
of I, s and j will be given by 1+4h = (2 +2+244n)3 
s+$h = (s? st x4 4p?)3 
j+hh= (+5 +5 +4w yh 
corresponding to equation (39) of §36. They commute with each other, and from 
(47) of §36 we see that with given numerical values for | and s the possible 
numerical values for j are J+s, I+s—h, ..., |l—sl. 

Let us consider a stationary state for which /, s and j have definite numerical 
values in agreement with the above scheme. The energy of this state will depend 
onl, but one might think that with neglect of the spin magnetic moments it would 
be independent of s, and also of the direction of the vector s relative to 1, and thus 
of 7. It will be found in Chapter IX, however, that the energy depends very much 
on the magnitude s of the vector s, although independent of its direction when one 
neglects the spin magnetic moments, on account of certain phenomena arising from 
the fact that the electrons are indistinguishable one from another. There are thus 


multiplet 
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different energy-levels of the system for each different value of / and s. This means 
that / and s are functions of the energy, according to the general definition of 
a function given in 811, since the / and s of a stationary state are fixed when 
the energy of that state is fixed. 

We can now take into account the effect of the spin magnetic moments, 
treating it as a small perturbation according to the method of §43. The energy 
of the unperturbed system will still be approximately a constant of the motion 
and hence / and s, being functions of this energy, will still be approximately 
constants of the motion. The directions of the vectors 1 and s, however, not being 
functions of the unperturbed energy, need not now be approximately constants of 
the motion and may undergo large secular variations. Since the vector j is constant, 
the only possible variation of 1 and s is a precession about the vector j. We thus 
have an approximate model of the atom consisting of the two vectors 1 and s of 
constant lengths precessing about their sum j, which is a fixed vector. The energy 
is determined mainly by the magnitudes of 1 and s and depends only slightly on 
their relative directions, specified by 7. Thus states with the same / and s and 
different j will have only slightly different energy-levels, forming what is called 
a multiplet term. 

Let us now take this atomic model as our unperturbed system and suppose 
it to be subjected to a uniform magnetic field of magnitude # in the direction of 
the z-axis. The extra energy due to this magnetic field will consist of a term 

(eH /2mc)(mz + hoz), (41) 
like the last term in equation (89) of §41, contributed by each electron, and will 
thus be altogether 


(eH /2mc) Si(m: + hoz) = (e#/2mc)(l, + 28.) = (eH /2mc)(j. + 82). (42) 


This is our perturbing energy V. We shall now use the method of §43 to determine 
the changes in the energy-levels caused by this V. The method will be legitimate 
only provided the field is so weak that V is small compared with the energy 
differences within a multiplet. 

Our unperturbed system is degenerate, on account of the direction of 
the vector j being undetermined. We must therefore take, from the representative 
of V in a Heisenberg representation for the unperturbed system, those matrix 
elements that refer to one particular energy-level for their row and column, 
and obtain the eigenvalues of the matrix thus formed. We can do this best by first 
splitting up V into two parts, one of which is a constant of the unperturbed motion, 
so that its representative contains only matrix elements referring to the same 
unperturbed energy-level for their row and column, while the representative of 
the other contains only matrix elements referring to two different unperturbed 
energy-levels for their row and column, so that this second part does not affect 
the first-order perturbation. The term involving j., in (42) is a constant of 
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the unperturbed motion and thus belongs entirely to the first part. For the term 
involving s, we have 

sag: ts a, a: i) = Je(SaJu + SyJy + S2jz) + (Sede — J28x) Ju + (S2Iy — JzSy)Fy 
or ee ae ee ee te eee ee ee 

S, = jG +H th) — 10+ BR) + 8(s + A) — [yde WINTER (43) 
where Nin Bat 958 Sab el ey = lho Sa, 

Vy = JeSe a S2Je = spel else = las 
The first term in this expression for s, is a constant of the unperturbed motion and 
thus belongs entirely to the first part, while the second term, as we shall now see, 
belongs entirely to the second part. 

Corresponding to (44) we can introduce Ye = 18 = lySe. 

It can now easily be verified that Ie hI a= 

and from (30) of §35  [j2,%2]=%y Ve Wl=—Ye [er V2] = 0. 

These relations connecting jz, jy, jz and yz, Vy, Yz are of the same form as 
the relations connecting mz, my, mz and x, y, z in the calculation in §40 
of the selection rule for the matrix elements of z in a representation with k 
diagonal. From the result there obtained that all matrix elements of z vanish 
except those referring to two k values differing by +h, we can infer that all matrix 
elements of y,, and similarly of 7, and y,, in a representation with j diagonal, 
vanish except those referring to two 7 values differing by +h. The coefficients 
of 7, and 7, in the second term on the right-hand side of (43) commute with J, 
so the representative of the whole of this term will contain only matrix elements 
referring to two j values differing by +h, and thus referring to two different 
energy-levels of the unperturbed system. 

Hence the perturbing energy V becomes, when we neglect that part of it whose 
representative consists of matrix elements referring to two different unperturbed 
energy-levels, CH | aes jg +h) —lL+h)+s(s+h) (45) 

Qmer* | 2j(7 +h) 
The eigenvalues of this give the first-order changes in the energy-levels. We can 
make the representative of this expression diagonal by choosing our representation 
such that j, is diagonal, and it then gives us directly the first-order changes 
in the energy-levels caused by the magnetic field. This expression is known as 
Landé’s formula. 

The result (45) holds only provided the perturbing energy V is small 
compared with the energy differences within a multiplet. For larger values 
of V a more complicated theory is required. For very strong fields, however, 
for which V is large compared with the energy differences within a multiplet, 


(44) 


‘[This is a version of Alfred Landé’s g-formula. Alfred Landé (1951) Quantum mechanics 
Pitman; p. 208f Alfred Landé ,,Termstruktur und Zeemaneffekt der Multipletts* Zeitschrift fiir 
Physik (1923) 15 pp. 189-205 doi: 10.1007/BF01330473 | 
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the theory is again very simple. We may now neglect altogether the energy 
of the spin magnetic moments for the atom with no external field, so that 
for our unperturbed system the vectors 1 and s themselves are constants 
of the motion, and not merely their magnitudes / and s. Our _ perturbing 
energy V, which is still (e#/2mc)(j, + s,), is now a constant of the motion 
for the unperturbed system, so that its eigenvalues give directly the changes in 
the energy-levels. These eigenvalues are integral or half-odd integral multiples 
of e#h/2mc according to whether the number of electrons in the atom is even 
or odd. 


VIII. COLLISION PROBLEMS 


48. General remarks 

IN this chapter we shall investigate problems connected with a particle which, 
coming from infinity, encounters or ‘collides with’ some atomic system and, 
after being scattered through a certain angle, goes off to infinity again. The atomic 
system which does the scattering we shall call, for brevity, the scatterer. We thus 
have a dynamical system composed of an incident particle and a scatterer 
interacting with each other, which we must deal with according to the laws of 
quantum mechanics, and for which we must, in particular, calculate the probability 
of scattering through any given angle. The scatterer is usually assumed to be of 
infinite mass and to be at rest throughout the scattering process. The problem 
was first solved by Max Born by a method substantially equivalent to that of 
the next section. We must take into account the possibility that the scatterer, 
considered as a system by itself, may have a number of different stationary states 
and that if it is initially in one of these states when the particle arrives from 
infinity, it may be left in a different one when the particle goes off to infinity again. 
The colliding particle may thus induce transitions in the scatterer. 

The Hamiltonian for the whole system of scatterer plus particle will not 
involve the time explicitly, so that this whole system will have stationary states 
represented by periodic solutions of Schrédinger’s wave equation. The meaning of 
these stationary states requires a little care to be properly understood. It is evident 
that for any state of motion of the system the particle will spend nearly all its time 
at infinity, so that the time average of the probability of the particle being in any 
finite volume will be zero. Now for a stationary state the probability of the particle 
being in a given finite volume, like any other result of observation, must be 
independent of the time, and hence this probability will equal its time average, 
which we have seen is zero. Thus only the relative probabilities of the particle 
being in different finite volumes will be physically significant, their absolute 
values being all zero. The total energy of the system has a continuous range 
of eigenvalues, since the initial energy of the particle can be anything. Thus a ket, 
|s) say, corresponding to a stationary state, being an eigenket of the total energy, 
must be of infinite length. We can see a physical reason for this, since if |s) were 
normalized and if Q denotes that observable—a certain function of the position of 
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the particle—that is equal to unity if the particle is in a given finite volume and 
zero otherwise, then (s| Q|s) would be zero, meaning that the average value of Q, 
i.e. the probability of the particle being in the given volume, is zero. Such a ket |s) 
would not be a convenient one to work with. However, with |s) of infinite length, 
(s|Q|s) can be finite and would then give the relative probability of the particle 
being in the given volume. 

In picturing a state of a system corresponding to a ket |x) which is not 
normalized, but for which (x |x) = n say, it may be convenient to suppose that 
we have n similar systems all occupying the same space but with no interaction 
between them, so that each one follows out its own motion independently of 
the others, as we had in the theory of the Gibbs ensemble in §33. We can 
then interpret (z|a|x), where a is any observable, directly as the total a for all 
the n systems. In applying these ideas to the above-mentioned |s) of infinite length, 
corresponding to a stationary state of the system of scatterer plus colliding particle, 
we should picture an infinite number of such systems with the scatterers all located 
at the same point and the particles distributed continuously throughout space. 
The number of particles in a given finite volume would be pictured as (s| Q|s), 
Q being the observable defined above, which has the value unity when the particle 
is in the given volume and zero otherwise. If the ket is represented by a Schrédinger 
wave function involving the Cartesian coordinates of the particle, then the square 
of the modulus of the wave function could be interpreted directly as the density of 
particles in the picture. One must remember, however, that each of these particles 
has its own individual scatterer. Different particles may belong to scatterers 
in different states. There will thus be one particle density for each state of 
the scatterer, namely the density of those particles belonging to scatterers in 
that state. This is taken account of by the wave function involving variables 
describing the state of the scatterer in addition to those describing the position of 
the particle. 

For determining scattering coefficients we have to investigate stationary states 
of the whole system of scatterer plus particle. For instance, if we want to determine 
the probability of scattering in various directions when the scatterer is initially in 
a given stationary state and the incident particle has initially a given velocity 
in a given direction, we must investigate that stationary state of the whole 
system whose picture, according to the above method, contains at great distances 
from the point of location of the scatterers only particles moving with the given 
initial velocity and direction and belonging each to a scatterer in the given 
initial stationary state, together with particles moving outward from the point 
of location of the scatterers and belonging possibly to scatterers in various 
stationary states. This picture corresponds closely to the actual state of affairs 
in an experimental determination of scattering coefficients, with the difference 
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that the picture really describes only one actual system of scatterer plus particle. 
The distribution of outward moving particles at infinity in the picture gives 
us immediately all the information about scattering coefficients that could be 
obtained by experiment. For practical calculations about the stationary state 
described by this picture one may use a perturbation method somewhat like that 
of §43, taking as unperturbed system, for example, that for which there is no 
interaction between the scatterer and particle. 

In dealing with collision problems, a further possibility to be taken into 
consideration is that the scatterer may perhaps be capable of absorbing and 
re-emitting the particle. This possibility arises when there exists one or more states 
of absorption of the whole system, a state of absorption being an approximately 
stationary state which is closed in the sense mentioned at the end of §38 
(i.e. for which the probability of the particle being at a greater distance than r 
from the scatterer tends to zero as r — oo). Since a state of absorption 
is only approximately stationary, its property of being closed will be only 
a transient one, and after a sufficient lapse of time there will be a finite probability 
of the particle being on its way to infinity. Physically this means there is a finite 
probability of spontaneous emission of the particle. The fact that we had to use 
the word ‘approximately’ in stating the conditions required for the phenomena of 
emission and absorption to be able to occur shows that these conditions are not 
expressible in exact mathematical language. One can give a meaning to these 
phenomena only with reference to a perturbation method. They occur when 
the unperturbed system (of scatterer plus particle) has stationary states that 
are closed. The introduction of the perturbation spoils the stationary property 
of these states and gives rise to spontaneous emission and its converse absorption. 

For calculating absorption and emission probabilities it is necessary to deal 
with non-stationary states of the system, in contradistinction to the case for 
scattering coefficients, so that the perturbation method of §44 must be used. 
Thus for calculating an emission coefficient we must consider the non-stationary 
states of absorption described above. Again, since an absorption is always 
followed by a re-emission, it cannot be distinguished from a scattering in any 
experiment involving a steady state of affairs, corresponding to a stationary state 
of the system. The distinction can be made only by reference to a non-steady 
state of affairs, e.g. by use of a stream of incident particles that has an abrupt? 
beginning, so that the scattered particles will appear immediately after the incident 
particles meet the scatterers, while those that have been absorbed and re-emitted 
will begin to appear only some time later. This stream of particles would be 
the picture of a certain ket of infinite length, which could be used for calculating 
the absorption coefficient. 


‘[Original:- ‘sharp’] 
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49. The scattering coefficient 

We shall now consider the calculation of scattering coefficients, taking first the case 
when there is no absorption and emission, which means that our unperturbed 
system has no closed stationary states. We may conveniently take this unperturbed 
system to be that for which there is no interaction between the scatterer and 


particle. Its Hamiltonian will thus be of the form FE = H, + W, (1) 
where H, is that for the scatterer alone and W that for the particle alone, namely, 
with neglect. of relativistic mechanics, W = (1/2m)(pz + p; + p2). (2) 


The perturbing energy V, assumed small, will now be a function of the Cartesian 
coordinates of the particle x, y, z, and also, perhaps, of its momenta p,, py, Dz; 
together with dynamical variables describing the scatterer. 

Since we are now interested only in stationary states of the whole system, 
we use a perturbation method like that of §43. Our unperturbed system now 
necessarily has a continuous range of energy-levels, since it contains a free 
particle, and this gives rise to certain modifications in the perturbation method. 
The question of the change in the energy-levels caused by the perturbation, 
which was the main question of 843, no longer has a meaning, and the convention 
in §43 of using the same number of primes to denote nearly equal eigenvalues of 
E and H now* becomes redundant. Again, the splitting of energy-levels which 
we had in §43 when the unperturbed system is degenerate cannot now arise, 
since if the unperturbed system is degenerate the perturbed one, which must 
also have a continuous range of energy-levels, will also be degenerate to exactly 
the same extent. 

We again use the general scheme of equations developed at the beginning of §43, 
equations (1) to (4) there, but we now take our unperturbed stationary state 
forming the zero-order approximation to belong to an energy-level E’ just equal to 
the energy-level H’ of our perturbed stationary state. Thus the a’s introduced in 
the second of equations (3) of §43 are now all zero and the second of equations (4) 
there now reads (E’—E)|1)=V 0). (3) 
Similarly, the third of equations (4) of §43 now reads (E’ — E)|2)=V|1). (4) 
We shall proceed to solve equation (3) and to obtain the scattering coefficient to 
the first order. We shall need equation (4) in §51. 

Let a denote a complete set of commuting observables describing the scatterer, 
which are constants of the motion when the scatterer is alone and may thus be 
used for labelling the stationary states of the scatterer. This requires that H, shall 
commute with the a’s and be a function of them. We can now take a representation 
of the whole system in which the a’s and zx, y, z, the coordinates of the particle, 
are diagonal. This will make H, diagonal. Let |0) be represented by (xa’ | 0) and 
|1) by (xa’ | 1), the single variable x being written to denote x, y, z, and the prime 


*lOriginal:- ‘drops out.’| 


49. The scattering coefficient 159 


being omitted from x for brevity. In the same way the single differential d°x will 
be written to denote the product dadydz. Equation (3), written in terms of 
representatives, becomes, with the help of (1) and (2), 

{E'— H,(a!) + (h?/2m)V"} (xa! | 1) = 55 i) (xa!|V [x"a") dbx" (xa | 0). (5) 


QQ”! 


Suppose that the incident particle has the momentum p® and that the initial 
stationary state of the scatterer is a®° The stationary state of our unperturbed 
system is now the one for which p = p® and a = a®, and hence its representative is 
(xa! | 0) = daiqo etP?)/h (6) 
This makes equation (5) reduce to 
{E' — H,(a’) + (h?/2m)V?} (xa’ | 1) = [Ge Vx a?) a?x? Coe 


or (k? + V?) (xo! | 1) =F, (7) 
where k? = Imi LE’ — H,(a’)} (8) 
and F= 2m f (xa V |x°a°) d3x9 ei(P®x°)/h (9) 


a definite function of x, y, z and a’. We must also have 
E’ = H,(a®) + po’ /2m. (10) 

Our problem now is to obtain a solution (xa’ | 1) of (7) which, for values 
of x, y & z denoting points far from the scatterer, represents only outward 
moving particles. The square of its modulus, |(xa/ | 1)|*, will then give the density 
of scattered particles belonging to scatterers in the state a’ when the density 
of the incident particles is |(xa° | 0)|*, which is unity. If we transform to polar 
coordinates r, 0, ¢, equation (7) becomes 

o 20 df 2 ge VSO 1 Oo? 

{e : Or? | r Or : r? sin 6 00 am ag | pe ae | lee ae At) 
Now F' must tend to zero as r — oo, on account of the physical requirement that 
the interaction energy between the scatterer and particle must tend to zero as 
the distance between them tends to infinity. If we neglect F in (11) altogether, 
an approximate solution for large r is (r@da’ | 1) = u(6,¢,a’)r~‘e””, (19) 


where u is an arbitrary function of 6, ¢ and a’, since this expression substituted 
in the left-hand side of (11) gives a result of order r~* When we do not neglect F, 
the solution of (11) will still be of the form (12) for large r, provided F’ tends to 
zero sufficiently rapidly as r > oo, but the function u will now be definite and 
determined by the solution for smaller values of r. 

For values a’ of the a’s such that k?, defined by (8), is positive, the k in 
(12) must be chosen to be the positive square root of k?, in order that (12) 
may represent only outward moving particles, i.e. particles for which the radial 
component of momentum, which from §38 equals p, — ihr! or —ih(O/Or + r~4), 
has a positive value. We now have that the density of scattered particles belonging 


radial momentum 
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to scatterers in state a’, equal to the square of the modulus of (12), falls off 
with increasing r according to the inverse square law, as is physically necessary, 
and their angular distribution is given by |u(0,¢,q’)|.. Further, the magnitude, 
P' say, of the momentum of these scattered particles must equal kh, the momentum 
being radial for large r, so that their energy is equal to 

Pp? kA? ; p” 

== = EF! — H,(o') = H,(a°) — H,(a’) + —, 

2m 2m 2m 
with the help of (8) and (10). This is just the energy of an incident 
particle, namely p®/2m, reduced by the increase in energy of the scatterer, 
namely H,(a’') — H,(a°), in agreement with the law of conservation of energy. 
For values a’ of the a’s such that k? is negative there are no scattered particles, 
the total initial energy being insufficient for the scatterer to be left in the state a’. 

We must now evaluate u(0, ¢, a’) for a set. of values a’ for the a’s such that k? 
is positive, and obtain the angular distribution of the scattered particles belonging 
to scatterers in state a’ It is sufficient to evaluate u for the direction 6 = 0 
of the pole of the polar coordinates, since this direction is arbitrary. We make 
use of Green’s theorem, which states that for any two functions of position 
A and B the volume integral [(AV?B — BV?A) d’x taken over any volume 
equals the surface integral [(A0B/0n—BOA/On) dS taken over the boundary of 
the volume, 0/0n denoting differentiation along the normal to the surface. We take 
A= ere B= (rOda! | 1) 
and apply the theorem to a large sphere with the origin as centre. The volume 
integrand is thus e7**"°°S? V?(rAdga’ | 1) — (rOda’ | 1) Ver 078? 
= ae OV 4 k*) (roa! | 1) = e7 tkr cos 8 J 

from (7) or (11), while the surface integrand is, with the help of (12), 


O O _, 
—ikr cos @ ! ! —ikr cos 6 
e a, (rOga’ | 1) — (réga’' | 1) 7° 


= Lath \ he - 
—: ikr cos 0, (- ! ekr L 4 ek cos #)e ikr cos@ 
r 


r2 r 


= ikur1(1 + cos 6) ets 9) 
with neglect of r~*. Hence we get 


20 T 
as d°x = | dd r’sin 6 dO ikur ‘(1 + cos Q) eter (ens 4) 
0 0 


the volume integral on the left being taken over the whole of space. The right-hand 
side becomes, on being integrated by parts with respect to 0, 


20 a 7 
| dd { [u(1 + cos Derere) co — | citron) (d + cos 6)] dé \ 
0 0 


The second term in the {} brackets is of the order of magnitude of r~, as would be 
revealed by further partial integrations, and may therefore be neglected. We are 
thus left with 
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20 
ay dex = -2 f dou(0, d, a’) = —4r u(0, ¢, a’), 
0 


giving the value of u(0, ¢, a’) for the direction 0 = 0. 
This result may be written 


u(0, d, a’) = —(4n)7} pee d°x, (13) 


since P’ = kh. If the vector p’ denotes the momentum of the scattered electrons 
coming off in a certain direction (and is thus of magnitude P’), the value of u for 
this direction will be u(6’, b, a’) = —(40)7} cP ~0/h BP gx 


as follows from (13) if one takes this direction to be the pole of the polar 
coordinates. This becomes, with the help of (9), 


u(O, ¢, a’) = —(2n)*mh” / eR 707F dex (xa’ | V |x°a°) d?x° ei(p®x°)/h 


= —2nmh (p'a'| V |p°a”), (14) 
when one makes a transformation from the coordinates x to the momenta p of 
the particle, using the transformation function (54) of §23. The single letter p is 
here used as a label for the three components of momentum. 

The density of scattered particles belonging to scatterers in state a’ is now 
given by |u(6, 6 a’)|"/r2 Since their velocity is P’/m, the rate at which 
these particles appear per unit solid angle about the direction of the vector p’ will 
be (P'/m)|u(@, ¢', a’)|?. The density of the incident particles is, as we have seen, 
unity, so that the number of incident particles crossing unit area per unit time is 
equal to their velocity P°/m, where P° is the magnitude of p® Hence the effective 
area that must be hit by an incident particle in order to be scattered in a unit 
solid angle about the direction p’ and then belong to a scatterer in state a’ will be 

(P’/P°)|u(6, ¢, a)? = (4n?m?h? P’/P°)|(p'a’| V |p°a”) |” (15) 
This is the scattering coefficient for transitions a®° — a’ of the scatterer. 
It depends on that matrix element (p’a/|V |p°a®) of the perturbing energy V 
whose column p’a° and whose row p’a’ refer respectively to the initial and final 
states of the unperturbed system, between which the scattering transition process 
takes place. The result (15) is thus in some ways analogous to the result (24) of §44, 
although the numerical coefficients are different in the two cases, corresponding to 
the different natures of the two transition processes. 


50. Solution with the momentum representation 

The result (15) for the scattering coefficient makes a reference only to that 
representation in which the momentum p is diagonal. One would thus expect 
to be able to get a more direct proof of the result by working all the time in 
the p-representation, instead of working in the x-representation and transforming 
at the end to the p-representation, as was done in §49. This would not at first sight 
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appear to be a great improvement, as the lack of directness of the x-representation 
method is offset by more direct applicability, it being possible to picture the square 
of the modulus of the x-representative of a state as the density of a stream of 
particles in process of being scattered. The x-representation method has, however, 
other more serious disadvantages. One of the main applications of the theory of 
collisions is to the case of photons as incident particles. Now a photon is not 
a simple particle but has a polarization. It is evident from classical electromagnetic 
theory that a photon with a definite momentum, i.e. one moving in a definite 
direction with a definite frequency, may have a definite state of polarization (linear, 
circular, etc.), while a photon with a definite position, which is to be pictured 
as an electromagnetic disturbance confined to a very small volume, cannot have 
any definite polarization. These facts mean that the polarization observable of 
a photon commutes with its momentum but not with its position. This results in 
the p-representation method being immediately applicable to the case of photons, 
it being only necessary to introduce the polarizing variable into the representatives 
and treat it along with the a’s describing the scatterer, while the x-representation 
method is not applicable. Further, in dealing with photons, it is necessary to take 
relativistic mechanics into account. This can easily be done in the p-representation 
method, but not so easily in the x-representation method. 
Equation (3) still holds with relativistic mechanics, but W is now given by 

WP aMe+ Pane + pt P, +p (16) 
instead of by (2). Written in terms of p-representatives, equation (3) gives 

{E! — H,(a") — W} (pa’ | 1) = (pa'| V |0), 
p being written instead of p’ for brevity and W being understood as a definite 
function of p;, py & pz given by (16). This may be written 

(W’ — W) (pa’ | 1) = (pa’| V |0), (17) 
where W' = E' — H,(a’) (18) 
and is the energy required by the law of conservation of energy for a scattered 
particle belonging to a scatterer in state a’. The ket |0) is represented by (6) in 
the x-representation and the basic ket |p°a°) is represented by 
(xa! | pa’) = darao (x | p’) = Seyiqoht 2 etP® x)/h 

from the transformation function (54) of §23. Hence 


\0) = h? [p°a”), (19) 
and equation (17) may be written ; 
(W' —W) (pa’ | 1) = h? (pa'| V [p°a®). (20) 


We now make a transformation from the Cartesian coordinates pz, py, pz of p 
to its polar coordinates P, w, y, given by 
P2=Pcosw, py=Psinwcosx, p,= Psinwsiny. 
If in the new representation we take the weight function P? sin w, then the weight 
attached to any volume of p-space will be the same as in the previous 
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p-representation, so that the transformation will mean simply a relabelling of 
the rows and columns of the matrices without any alteration of the matrix 
elements. Thus (20) will become in the new representation 

(W! —W) (Pwxal | 1) =h? (Pwxa'| V |P°w°x°a°), (21) 
W being now a function of the single variable P. 

The coefficient of (Pwxa’ | 1), namely W’ — W, is now simply a multiplying 
factor and not a differential operator as it was with the x-representation method. 
We can therefore divide out by this factor and obtain an explicit expression for 
(Pwxa'|1). When, however, a’ is such that W’, defined by (18), is greater 
than mc?, this factor will have the value zero for a certain point in the domain 
of the variable P, namely the point P = P’, given in terms of W’ by (16). 
The function (Pwya’ | 1) will then have a singularity at this point. This singularity 
shows that (Pwya’ | 1) represents an infinite number of particles moving about at 
great distances from the scatterers with energies indefinitely close to W’ and it is 
therefore this singularity that we have to study to get the angular distribution of 
the particles at infinity. 

The result of dividing out (21) by the factor W’—W is, according to (13) of §15, 

(Puxa'|1) = he (Pwxa’ | V | Pew x°a°) /(W' —W)+X(w, x,a")d(W’—-W), (22) 
where \ is an arbitrary function of w, y and a’. To give a meaning to the first 
term on the right-hand side of (22), we make the convention that its integral with 
respect to P over a range that includes the value P’ is the limit when « > 0 of 
the integral when the small domain P’ — « to P’ + «€ is excluded from the range 
of integration. This is sufficient to make the meaning of (22) precise, since we are 
interested effectively only in the integrals of the representatives of states when 
the representation has continuous ranges of rows and columns. We see that 
equation (21) is inadequate to determine the representative (Pwxa' |1) completely, 
on account of the arbitrary function \ occurring in (22). We must choose this » 
such that (Pwya'|1) represents only outward moving particles, since we want 
the only inward moving particles to be those corresponding to |0). 

Let us take first the general case when the representative (Pwx |) of a state 
of the particle satisfies an equation of the type 

(W’ — W) (Pwx |) = f(P,w,x), (23) 
where f(P,w,x) is any function of P, w, and y, and W’ is a number greater 
than mc”, so that (Pw | ) is of the form 

(Pux |) = f(P,w,x)/(W' — W) + Aw, x)d(W" — W), (24) 
and let us determine now what A must be in order that (Pwy | ) may represent 
only outward moving particles. We can do this by transforming (Pwyx |) to 
the x-representation, or rather the (r@¢)-representation, and comparing it with 
(12) for large values of r. The transformation function is 
(roo | Pwx) _ ha eilP>x)/h =: — 3 eiPricosw cos 6+sin w sin 8 cos(x—4)]/h 
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For the direction 0 = 0 we find 


(r0¢ |) = at f Pp? i i sinw div e' P84!" ( Pasy |) 
0 


0 


jo) 


W=T 


eiPr cos w/h 


= p73 Par} “" (P 
i i a -| Ga IPryh ce | 


4 eiPrcosw/h fa) é 
ee 


The second term in the {} brackets is of order r~?, as may be verified by further 
partial integrations with respect to w, and can therefore be neglected. We are 
left with 


(r0¢ |) = 4(Q0r)~ ad PdP ie dy {e7*P"/" (Pry |) — e'Pr/* (POy | )} 
= ihr} af PdP{e?P? (Pry |) —ePr/* (Poy | J}. (25) 
0 


When we substitute for (Pwy |) its value given by (24), the first term in 
the integrand in (25) gives 


ih-br7 Ve dP eW*Pr/hs ¢(P x, x) /(W' — W) + X(a,x)5(W' —W)}. (26) 


0 
The term involving 6(W’ — W) here may be integrated immediately and gives, 
when one uses the relation Pd P = WdW/c?, which follows from (16), 


th-4c-?r—} | W dW e P(r, x) 5(W! —W) =the ?*r WX ay)eP /* (27) 


me? 
To integrate the other term in (26) we use the formula 
oe) etPr/h oe) etPr/h 
Vr Cf =Gi- ——— dP 28 
[oppor - Py [ Spe. (28) 
with neglect of terms es r-\ for any continuous function g(P), 


which formula holds since [5° K(P)e~‘”"/" dP. is of order r~' for any continuous 
function K(P) and since the difference g(P)/(P’ — P) — g(P’)/(P’ — P) is 
continuous. The right-hand side of (28), when evaluated with neglect of terms 
involving r~', and also with neglect of the small domain P’—e to P’+e in the domain 
of integration, gives 


co 4-iPr/h co .i(P’—P)r/h 
/ € = —iP'r/h e€ P 
(Pf GopeP - apy” fa 


=ig pyre [SRR SEE ap ing)? (20) 


—oo 


In our present example g(P) is 
g(P) = ihr PF(P, 1, x)(P’ — P)/(W' —W), 
which has the limiting value when P = P’, 
QP y= ihr! P' f (Pl tT, X)W'/P'? = ih-4c?r 1 W' f (P’, T,X). 
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Substituting this in (29) and adding on the expression (27), we obtain the following 
value for the integral (26) hc rt W' {nf (Pi mx) +id(m ype?" (30) 
Similarly the second term in the integrand in (25) gives 

her W'{—1f(P! 0,x) —iA(0, x) fe?'/% ~~ (31) 
The sum of these two expressions is the value of (r0¢ | ) when r is large. 

We require that (r0¢|) shall represent only outward moving particles, 
and hence it must be of the form of a multiple of e“”’’/" Thus (30) must vanish, 
so that A(n, X) = —inf(P) a, x). (32) 
We see in this way that the condition that (r@¢ | ) shall represent only outward 
moving particles in the direction 6 = 0 fixes the value of \ for the opposite direction 
§ = 7. Since the direction 6 = 0 or w = 0 of the pole of our polar coordinates is 
not in any way singular, we can generalize (32) to 

X(w, xX) = —inf(P; is.) (33) 
which gives the value of for an arbitrary direction. This value substituted in 
(24) gives a result that may be written 

(Pux |) = f(P,w,x){1/(W' — W) — ind(W' - W)}, (34) 
since one can substitute P’ for P in the coefficient of a term involving 6(W’ — W) 
as a factor without changing the value of the term. The condition that (Pwx | ) 
shall represent only outward moving particles is thus that it shall contain the factor 

{1/(W' — W) — ind(W’ — W)}. (35) 
It is interesting to note that this factor is of the form of the right-hand side of 
equation (15) of §15. 

With A given by (33), expression (30) vanishes and the value of (r0¢@ | ) for 
large r is given by expression (31) alone, thus 

(r0d |) = —2th~3c-?r“W' f (P! 0, xetP'7/* 
This may be generalized to (r0¢ |) = —2ah3c?r |! W'f (Pl w, y)eP'"/" 
giving the value of (r0¢ | ) for any direction 0, ¢ in terms of f(P/ w, x) for the same 
direction labelled by w, y. This is of the form (12) with 

u(O, bd) = —2ah-2c-?W' f (P! w, x) 
and thus represents a distribution of outward moving particles of momentum P’ 
whose number is 2p! 2W'P' 

Srl? = (Pwo? (30) 
per unit solid angle per unit time. This distribution is the one represented by 
the (Pwx |) of (34). 

From this general result we can infer that, whenever we have a representative 
(Pwx |) representing only outward moving particles and satisfying an equation of 
the type (23), the number per unit solid angle per unit time of these particles 
is given by (36). If this (Pwy |) occurs in a problem in which the number 
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of incident particles is one per unit volume, it will correspond to a scattering 
coefficient of amount An?W°W'P’ ‘ 2 
apo |F(P; W,x)I"- (37) 
It is only the value of the function f(P,w,x) for the point P = P’ that is 
of importance. 
If we now apply this general theory to our equations (21) and (22), we have 
f(Pw,x) =h? (Puxa'| V | P®w°x°a”). 
Hence from (37) the scattering coefficient is 
(4n*h?W°W'P'/c*P®)| (P’wyxa'| V | PPw?x°a°) E (38) 
If one neglects relativity and puts W°W’/ci =m, this result reduces to 
the result (15) obtained in the preceding section by means of Green’s theorem. 


51. Dispersive scattering 
We shall now determine the scattering when the incident particle is capable of 
being absorbed, that is, when our unperturbed system of scatterer plus particle 
has closed stationary states with the particle absorbed. The existence of these 
closed states for the unperturbed system will be found to have a considerable effect 
on the scattering for the perturbed system, and indeed an effect that depends very 
much on the energy of the incident particle, giving rise to the phenomenon of 
dispersion in optics when the incident particle is taken to be a photon. 

We use a representation for which the basic kets correspond to the stationary 
states of the unperturbed system, as was the case with the p-representation of 
the preceding section. We take these stationary states to be the states (p’a’) for 
which the particle has a definite momentum p’ and the scatterer is in a definite 
state a’, together with the closed states, k say, which form a separate discrete set, 
and assume that these states are all independent and orthogonal. This assumption 
is not accurate when the particle is an electron or atomic nucleus, since in this case 
for an absorbed state k the particle will still certainly be somewhere, so that 
one would expect to be able to expand |k) in terms of the eigenkets |x’a’) of 
x,y, 2 and the a’s, and hence also in terms of the |p’a’)’s. On the other hand, 
when the particle is a photon it will no longer exist for the absorbed states, 
which are then certainly independent of and orthogonal to the states (p’a’) for 
which the particle does exist. Thus the assumption is valid in this case, which is 
an important practical one. 

Since we are concerned with scattering, we must still deal with stationary states 
of the whole system. We shall now, however, have to work to the second order of 
accuracy, so that we cannot use merely the first-order equation (3), but must use 
also (4). Equation (3) becomes, when written in terms of representatives in our 
present representation, (W’—W) (pa’ | 1) = (pa’|V oT 


(B’ — B,) (k | 1) = (AV 0), (39) 
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where W’ is the function of E” and the a’’s given by (18) and E; is the energy of 
the stationary state k of the unperturbed system. Similarly, equation (4) becomes 


(W’— W) (pa! | 2) = (pa! V[1), ie 
(E’ — Ex) (k | 2) = (k| V [1). 
Expanding the — hand sides by matrix multiplication, we get 
(W' — W) (pa! |2) = > /(ee'\V [p'o") ") dp" (p"a"" |1) 
+ > (pal V |") (el), 
kl!’ (41) 
(E' = By) (b2) => [RLV a") hp" ("a") 
+ $0 (kV |b") (ke |1). 
k”’ 
The ket |0) is still given by (19), so (39) may be written 
(W' — W) (pa’ | 1) = h? (pa’| V [p°a°), (42) 
(E' — Ex) (k | 1) = h? (k| V |p°a°). (43) 


We may assume that the matrix elements (k’| V |k”) of V vanish, since these 
matrix elements are not essential to the phenomena under investigation, 
and if they did not vanish it would mean simply that the absorbed states 
k had not been suitably chosen. We shall further assume that the matrix 
elements (p/a’‘| V |p”a”’) are of the second order of smallness when the matrix 
elements (k’| V |p’a”), (p’a’| V |k”) are taken to be of the first order of smallness. 
This assumption will be justified for the case of photons in 864. We now have 
from (43) and (42) that (k|1) is of the first order of smallness, provided E’ 
does not lie near one of the discrete set of energy-levels E,, and (pa’ | 1) is of 
the second order. The value of (pa’ | 2) to the second order will thus be given, 
from the first of equations (41), by 

(W’ — W) (pa! | 2) = h? So (pa'| Vk”) (k"|V |p°a®) /(E’ — Er). 
ye 
The total correction in the wave function to the second order, namely (pa’|1) plus 
(pa’|2), therefore satisfies 


' ' ' ae G80 (pa’ 7 ul (k| V 
(WW) {(pa’|1)+(pe!|2)} = 1 {Coe Vat) SS ‘} 


This equation is of the type (23), provided a’ is such ne W’' > mc?, which means 
that a’ as a final state for the scatterer is not inconsistent with the law of 
conservation of energy. We can therefore infer from the general result (37) that 
the scattering coefficient is 
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Se WOW"P' arty pe?) 4 » wall V \ (k| V [pa |” 


tt 
cP? — Ek; ( ) 


The scattering may now be considered as composed of two parts, a part that 
arises from the matrix element (p’a’| V |p°a°) of the perturbing energy and a part 
that arises from the matrix elements (p’a’|V |k) and (k| V |p°a°). The first part, 
which is the same as our previously obtained result (38), may be called the direct 
scattering. The second part may be considered as arising from an absorption 
of the incident particle into some state k, followed immediately by a re-emission 
in a different direction, and is like the transitions through an intermediate state 
considered in 844. The fact that we have to add the two terms before taking 
the square of the modulus denotes interference between the two kinds of scattering. 
There is no experimental way of separating the two kinds, the distinction between 
them being only mathematical. 


52. Resonance scattering 

Suppose the energy of the incident particle to be varied continuously while 
the initial state a° of the scatterer is kept fixed, so that the total energy E’ or 
H’ varies continuously. The formula (44) now shows that as E’ approaches one 
of the discrete set of energy-levels E;,, the scattering becomes very large. In fact, 
according to formula (44) the scattering should be infinite when E” is exactly equal 
to an E,. An infinite scattering coefficient is, of course, physically impossible, 
so that we can infer that the approximations used in deriving (44) are no longer 
legitimate when EF’ is close to an E,. To investigate the scattering in this case 


we must therefore go back to the exact equation (E’ — F)|H')=V |H"'), 


equation (2) of §43 with E’ written for AH’, and use a different method 
of approximating to its solution. This exact equation, written in terms of 
representatives like (41), becomes 


(W' = W) (pa! |") => | (pa'|V pa") dp" (pa | H) 


al! ae S- (pa’ | V ki”) (Kl | Ty 
z (45) 
(E'— By) (k |!) =O f (KV [p"a") ap" (p'al” | #”) 


+ (LV |) Ch" | HF). 
hr 
Let us take one particular E;, and consider the case when E’ is close to it. 
The large term in the scattering coefficient (44) now arises from those elements of 
the matrix representing V that lie in row k or in column k, i.e. those of the type 
(k| V |pa’) or (pa’| V |k). The scattering arising from the other matrix elements of 
V is of a smaller order of magnitude. This suggests that in our exact equations (45) 
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we should make the approximation of neglecting all the matrix elements of V except 
the important ones, which are those of the type (pa’| V |k) or (k| V |pa’), where a’ 
is a state of the scatterer that has not too much energy to be disallowed as a final 
state by the law of conservation of energy. These equations then reduce to 


(W'—W) (pa’ | H’) = (pa'| V |k) (k | A’), 


(B= Bu) (|!) = f (HIV pe’) ap (pa! | H', (46) 
the a’ summation being over those values of a’ for which W’ given by (18) is >mc?. 
These equations are now sufficiently simple for us to be able to solve exactly 
without further approximation. 
From the first of equations (46) we obtain by division 

(pa! | H’) = (pa'|V |k) (& | H")/(W' — W) + Ad(W" — W). (47) 
We must choose A, which may be any function of the momentum p and a’, such that 
(47) represents the incident particles corresponding to |0) or h2 |p°a°) together 
with only outward moving particles. [The Eeieent aie of h2 |p°a”) is actually of 
the form \6(W’—W), since the conditions a’ = a° and p = p?® for it not to vanish 
lead to W' = E' — H,(a’) = E' — H,(a°) = W° = W] Thus (47) must be 
(pa! | H') = ha (pa! |p°a”) + (pa’| V |k) (k | A") {1/(W' — W) — ind(W' — W)}, 


48 
and from the general formula (37) the scattering coefficient: will be! ee) 


(40° W°W'P'/he!P®)|(pa'| V |k)|” (kD) (49) 
It remains for us to determine the value of (k | H’). We can do this by 
substituting for (pa’ | H’) in the second of equations (46) its value given by (48). 
This gives 
(B’ — By) (k | H’) = h? (k| VV [p°a°) 


+b | YS [Cav [po’) P/V" 1) ins" —W) ap 
= h? (k| V |p°a®) + (k | H’) (a — ib), 


ee => i: (bl V kpa’) 2 dp /(W! — W) (50) 

atid =n Df (k| V [pa’) 2 6(W! — W) dp 
coun ?5(W! —W)P2 dP sinw dw dy 
=D we | \(k| V |P’wxa’) |? sinw dw dx. (51) 


‘Tp prime omitted.] 


half-width of an 
absorption line 
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Thus (k | H’) = h? (k| V |p°a°) /(B' — Ey — a + ib). (52) 
Note that a and 6 are real and that b is positive. 
This value for (k | H’) substituted in (49) gives for the scattering coefficient 

4n?h?WW'P’ |(p'a’| V [k)|?[(kLV [p°a®) |” 

Ps (E’ — FE, — a)? + b? 

One can obtain the total effective area that the incident particle must hit in order 

to be scattered anywhere by integrating (53) over all directions of scattering, i.e. by 

integrating over all directions of the vector p’ with its magnitude kept fixed at P’, 

and then summing over all a’ that are to be taken into consideration, i.e. for which 

W' > mc*. This gives, with the help of (51), the result 

4nh?W° — b|(k| V |p®a®) |? 

CP? (B'— BE, — a)? +b? 

If we suppose E’ to vary continuously through the value E;, the main variation 

of (53) or (54) will be due to the small denominator (E’— E,—a)?+0*. If we neglect 

the dependence of the other factors in (53) and (54) on E’, then the maximum 

scattering will occur when E’ has the value E; + a and the scattering will be half 

its maximum when F differs from this value by an amount b. The large amount 

of scattering that occurs for values of the energy of the incident particle that 

make E” nearly equal to Ey give rise to the phenomenon of an absorption line. 

The centre of the line is displaced by an amount a from the resonance energy of 

the incident particle, i.e. the energy which would make the total energy just Ex, 
while the quantity b is what is sometimes called the half-width of the line. 


(53) 


(54) 


53. Emission and absorption 

For studying emission and absorption we must consider non-stationary states 
of the system and must use the perturbation method of 844. To determine 
the coefficient of spontaneous emission we must take an initial state for which 
the particle is absorbed, corresponding to a ket |k), and determine the probability 
that at some later time the particle shall be on its way to infinity with a definite 
momentum. The method of §46 can now be applied. From the result (39) of 
that section we see that the probability per unit time per unit range of w and y, 
of the particle being emitted in any direction w’, y’ with the scatterer being left in 
state a’ is Qnh"|(W'w'y’a’| V |k)/?, (55) 
provided, of course, that a’ is such that the energy W’, given by (18), of the particle 
is greater than mc? For values of a’ that do not satisfy this condition there 
is no emission possible. The matrix element (W’w’y‘a’| V |k) here must refer 
to a representation in which W, w, x and a are diagonal with the weight 
function unity. The matrix elements of V appearing in the three preceding 
sections refer to a representation in which pz, py, pz are diagonal with the weight 
function unity, or P, w, x are diagonal with the weight function P?sinw. 
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They would thus refer to a representation in which W, w, x are diagonal with 
the weight function (dP/dW)P?sinw = (W P/c?) sinw. Thus the matrix element 
(W'w'x’a!| V |k) in (55) is equal to ((W’P’/c?) sinw’)? times our previous matrix 
element (W‘w'x‘a’| V |k) or (p’a’| V |k), so that (55) is equal to 
20 aes 
he 
The probability of emission per unit solid angle per unit time, with the scatterer 
simultaneously dropping to state a’, is thus 
2a W'P' 
SA lpa!|V (a)? (56) 

To obtain the total probability per unit time of the particle being emitted 
in any direction, with any final state for the scatterer, we must integrate (56) 
over all angles w’, y’ and sum over all states a’ whose energy H,(a’) is such that 
H,(a’) + mc? < Ex. The result is just 2b/h, where b is defined by (51). There is 
thus this simple relation between the total emission coefficient and the half-width b 
of the absorption line. 

Let us now consider absorption. This requires that we shall take an initial 
state for which the particle is certainly not absorbed but is incident with 
a definite momentum. Thus the ket corresponding to the initial state must be 
of the form (19). We must now determine the probability of the particle being 
absorbed after time ¢. Since our final state k is not one of a continuous range, 
we cannot use directly the result (39) of §46. If, however, we take 

lo) = [p°a'), (7) 
as the ket corresponding to the initial state, the analysis of §§44 and 46 is still 
applicable as far as equation (36) and shows us that the probability of the particle 
being absorbed into state k after time t is 

2|(k| V |p°a°)|"[1 — cos{(Ex — E’)t/R}]/(Ex — BE’)? 
This corresponds to a distribution of incident particles of density h~*, owing to 
the omission of the factor h? from (57), as compared with (19). The probability of 
there being an absorption after time t when there is one incident particle crossing 
unit area per unit time is therefore 
(2n?W°/c?P°)|(k| V |p°2”)|"[1 — cos{(E, — E’)t/h}]/(E, — EY (58) 

To obtain the absorption coefficient we must consider the incident particles not 
all to have exactly the same energy W° = E’ — H,(a°), but to have a distribution 
of energy values about the correct value E, — H,(a°) required for absorption. 
If we take a beam of incident particles consisting of one crossing unit area per 
unit time per unit energy range, the probability of there being an absorption after 
time ¢ will be given by the integral of (58) with respect to E” This integral may 
be evaluated in the same way as (37) of §46 and is equal to 


(4n?h2W°t/c? P)|(k| V [p°a®) 


sin w"|(p'a’| V |k)|* 
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The probability per unit time of an absorption taking place with an incident beam 
of one particle per unit area per unit time per unit energy range is therefore 

(407h?W%/c? P?)| (kl V |p°a° |, (59) 
which is the absorption coefficient. 

The connexion between the absorption and emission coefficients (59) and (56) 
and the resonance scattering coefficients calculated in the preceding section should 
be noted. When the incident beam does not consist of particles all with the same 
energy, but consists of a unit distribution of particles per unit energy range crossing 
unit area per unit time, the total number of incident particles with energies near 
an absorption line that get scattered will be given by the integral of (54) with 
respect to E’. If one neglects the dependence of the numerator of (54) on E’, 
this integral will, since /% b 

i (E" — FE; — a)? + b? 
have just the value (59). Thus the total number of scattered particles in 
the neighbourhood of an absorption line is equal to the total number absorbed. 
We can therefore regard all these scattered particles as absorbed particles that are 
subsequently re-emitted in a different direction. Further, the number of particles 
in the neighbourhood of the absorption line that get scattered per unit solid angle 
about a given direction specified by p’ and then belong to scatterers in state a’ will 
be given by the integral with respect to E’ of (53), which integral has in the same 
way the value 47°h?W°W’'P’ a 2 2 

WP El eptal| V [A)PMaLV [po 

This is just equal to the absorption coefficient (59) multiplied by the emission 
coefficient (56) divided by 2b/h, the total emission coefficient. This is in agreement 
with the point of view of regarding the resonance scattered particles as those that 
are absorbed and then re-emitted, with the absorption and emission processes 
governed independently each by its own probability law, since this point of view 
would make the fraction of the total number of absorbed particles that are 
re-emitted in a unit solid angle about a given direction just the emission coefficient 
for this direction divided by the total emission coefficient. 


df =, 


IX. SYSTEMS CONTAINING 
SEVERAL SIMILAR PARTICLES 


54. Symmetrical and antisymmetrical states 

IF a system in atomic physics contains a number of particles of the same kind, 
e.g. a number of electrons, the particles are absolutely indistinguishable one 
from another. No observable change is made when two of them are interchanged. 
This circumstance gives rise to some curious phenomena in quantum mechanics 
having no analogue in the classical theory, which arise from the fact that in 
quantum mechanics a transition may occur resulting in merely the interchange 
of two similar particles, which transition then could not be detected by any 
observational means. A satisfactory theory ought, of course, to count two 
observationally indistinguishable states as the same state and to deny that any 
transition does occur when two similar particles exchange places. We shall find 
that it is possible to reformulate the theory so that this is so. 

Suppose we have a system containing n similar particles. We may take 
as our dynamical variables a set of variables €, describing the first particle, 
the corresponding set €2 describing the second particle, and so on up to the set €, 
describing the nth particle. We shall then have the €,/s commuting with the €,s 
for r # s. (We may require certain extra variables, describing what the system 
consists of in addition to the n similar particles, but it is not necessary to mention 
these explicitly in the present chapter.) The Hamiltonian describing the motion 
of the system will now be expressible as a function of the €,, €9,..., €,. The fact 
that the particles are similar requires that the Hamiltonian shall be a symmetrical 
function of the &, &,..., €n, ie. it shall remain unchanged when the sets of 
variables €, are interchanged or permuted in any way. This condition must hold, 
no matter what perturbations are applied to the system. In fact, any quantity of 
physical significance must be a symmetrical function of the €’s. 

Let |a1), |b1),... be kets for the first particle considered as a dynamical system 
by itself. There will be corresponding kets |a2), |bo),... for the second particle 
by itself, and so on. We can get a ket for the assembly by taking the product of 
kets for each particle by itself, for example 

|@1) |b2) |es) --- |Qn) = |a1bacs..- Gn) (1) 
say, according to the notation of (65) of §20. The ket (1) corresponds to a special 
kind of state for the assembly, which may be described by saying that each particle 
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is in its own state, corresponding to its own factor on the left-hand side of (1). 
The general ket for the assembly is of the form of a sum or integral of kets like (1), 
and corresponds to a state for the assembly for which one cannot say that each 
particle is in its own state, but only that each particle is partly in several states, 
in a way which is correlated with the other particles being partly in several states. 
If the kets |a,), |b,),... are a set of basic kets for the first particle by itself, 
the kets |a2), |b2),... will be a set of basic kets for the second particle by itself, 
and so on, and the kets (1) will be a set of basic kets for the assembly. We call 
the representation provided by such basic kets for the assembly a symmetrical 
representation, as it treats all the particles on the same footing. 

In (1) we may interchange the kets for the first two particles and get another 

ket for the assembly, namely |b,) |a2) |c3) .. . |gn) = |bia2c3.-- Gn)- 
More generally, we may interchange the role of the first two particles in any ket for 
the assembly and get another ket for the assembly. The process of interchanging 
the first two particles is an operator which can be applied to kets for the assembly, 
and is evidently a linear operator, of the type dealt with in 87. Similarly, 
the process of interchanging any pair of particles is a linear operator, and by 
repeated applications of such interchanges we get any permutation of the particles 
appearing as a linear operator which can be applied to kets for the assembly. 
A permutation is called an even permutation or an odd permutation according to 
whether it can be built up from an even or an odd number of interchanges. 

A ket for the assembly |X) is called symmetrical if it is unchanged by any 
permutation, ie. if PX ys xX) (2) 
for any permutation P. It is called antisymmetrical if it is unchanged by any even 
permutation and has its sign changed by any odd permutation, i.e. if 

P|X) = 1X), (3) 
the + or — sign being taken according to whether P is even or odd. The state 
corresponding to a symmetrical ket is called a symmetrical state, and the state 
corresponding to an antisymmetrical ket is called an antisymmetrical state. 
In a symmetrical representation, the representative of a symmetrical ket is 
a symmetrical function of the variables referring to the various particles and 
the representative of an antisymmetrical ket is an antisymmetrical function. 

In the Schrédinger picture, the ket corresponding to a state of the assembly 
will vary with time according to Schrédinger’s equation of motion. If it is initially 
symmetrical it must always remain symmetrical, since, owing to the Hamiltonian 
being symmetrical, there is nothing to disturb the symmetry. Similarly if the ket 
is initially antisymmetrical it must always remain antisymmetrical. Thus a state 
which is initially symmetrical always remains symmetrical and a state which is 
initially antisymmetrical always remains antisymmetrical. In consequence, it may 
be that for a particular kind of particle only symmetrical states occur in nature, 
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or only antisymmetrical states occur in nature. If either of these possibilities held, 
it would lead to certain special phenomena for the particles in question. 

Let us suppose first that only antisymmetrical states occur in nature. 
The ket (1) is not antisymmetrical and so does not correspond to a state occurring 
in nature. From (1) we can in general form an antisymmetrical ket by applying all 
possible permutations to it and adding the results, with the coefficient —1 inserted 
before those terms arising from an odd permutation, so as to get 


SC +P |a1b2c3 rar On) (4) 
P 


the + or — sign being taken according to whether P is even or odd. The ket (4) 
may be written as a determinant 


Fa i eee 
lbs) |b) [bs) Ba) 
ler) |e) les)... den) 

(5) 
fee to ites 3 le 


and its representative in a symmetrical representation is a determinant. The ket (4) 
or (5) is not the general antisymmetrical ket, but is a specially simple one. 
It corresponds to a state for the assembly for which one can say that certain 
particle-states, namely the states a, b, c,..., g are occupied, but one cannot say 
which particle is in which state, each particle being equally likely to be in any state. 
If two of the particle-states a, b, c,..., g are the same, the ket (4) or (5) vanishes 
and does not correspond to any state for the assembly. Thus two particles cannot 
occupy the same state. More generally, the occupied states must be all independent, 
otherwise (4) or (5) vanishes. This is an important characteristic of particles for 
which only antisymmetrical states occur in nature. It leads to a special statistics, 
which was first studied by Enrico Fermi, so we shall call particles for which only 
antisymmetrical states occur in nature fermions. 

Let us suppose now that only symmetrical states occur in nature. The ket (1) is 
not symmetrical, except in the special case when all the particle-states a, b, c,..., 9 
are the same, but we can always obtain a symmetrical ket from it by applying all 
possible permutations to it and adding the results, so as to get 


~ P |ayboc3.-- Gn). (6) 


The ket (6) is not the general symmetrical ket, but is a specially simple one. 
It corresponds to a state for the assembly for which one can say that certain 
particle-states are occupied, namely the states a, b, c,..., g, without being able 
to say which particle is in which state. It is now possible for two or more of 
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the states a, b, c,..., g to be the same, so that two or more particles can be in 
the same state. In spite of this, the statistics of the particles is not the same as 
the usual statistics of the classical theory. The new statistics was first studied by 
Satyendra Nath Bose, so we shall call particles for which only symmetrical states 
occur in nature bosons. 

We can see the difference of Bose statistics from the usual statistics by 
considering a special case—that of only two particles and only two independent 
states a and 6 for a particle. According to classical mechanics, if the assembly of 
two particles is in thermodynamic equilibrium at a high temperature, each particle 
will be equally likely to be in either state. There is thus a probability ! of 
both particles being in state a, a probability ! of both particles being in state b, 
and probability 4 of one particle being in each state. In the quantum theory there 
are three independent symmetrical states for the pair of particles, corresponding 
to the symmetrical kets |a1)|a2), |b1)|b2) and |a1)|b2) + |a2)|b1), and describable as 
both particles in state a, both particles in state b, and one particle in each state 
respectively. For thermodynamic equilibrium at a high temperature these three 
states are equally probable, as was shown in §33, so that there is a probability + 
of both particles being in state a, a probability } of both particles being in state b, 
and a probability + of one particle being in each state. Thus with Bose statistics 
the probability of two particles being in the same state is greater than with classical 
statistics. Bose statistics differ from classical statistics in the opposite direction to 
Fermi statistics, for which the probability of two particles being in the same state 
is zero. 

In building up a theory of atoms on the lines mentioned at the beginning 
of §38, to get agreement with experiment one must assume that two electrons 
are never in the same state. This rule is known as Pauli’s exclusion principle. 
It shows us that electrons are fermions. Planck’s law of radiation shows us 
that photons are bosons, as only the Bose statistics for photons will lead to 
Planck’s law. Similarly, for each of the other kinds of particle known in physics, 
there is experimental evidence to show either that they are fermions, or that they 
are bosons. Protons, neutrons, positrons are fermions, a-particles are bosons. 
It appears that all particles occurring in nature are either fermions or bosons, 
and thus only antisymmetrical or symmetrical states for an assembly of similar 
particles are met with in practice. Other more complicated kinds of symmetry are 
possible mathematically, but do not apply to any known particles. With a theory 
which allows only antisymmetrical or only symmetrical states for a particular 
kind of particle, one cannot make a distinction between two states which differ 
only through a permutation of the particles, so that the transitions mentioned at 
the beginning of this section disappear. 
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55. Permutations as dynamical variables 

We shall now build up a general theory for a system containing n similar particles 
when states with any kind of symmetry properties are allowed, i.e. when there is no 
restriction to only symmetrical or only antisymmetrical states. The general state 
now will not be symmetrical or antisymmetrical, nor will it be expressible linearly 
in terms of symmetrical and antisymmetrical states when n > 2. This theory will 
not apply directly to any particles occurring in nature, but all the same it is useful 
for setting up an approximate treatment for an assembly of electrons, as will be 
shown in 858. 

We have seen that each permutation P of the n particles is a linear operator 
which can be applied to any ket for the assembly. Hence we can regard P as 
a dynamical variable in our system of n particles. There are n! permutations, 
each of which can be regarded as a dynamical variable. One of them, P, say, 
is the identical permutation, which is equal to unity. The product of any two 
permutations is a third permutation and hence any function of the permutations 
is reducible to a linear function of them. Any permutation P has a reciprocal P~! 
satisfying PP'=P'pP=P,=1. 

A permutation P can be applied to a bra (X| for the assembly, to give 
another bra, which we shall denote for the present by P(X]. If P is applied 
to both factors of the product (X | Y), the product must be unchanged, since it is 
just a number, independent of any order of the particles. Thus 

(P(X|)PIY) = (X |Y), 


showing that PY Xe GPs. @) 
Now P(X] is the conjugate imaginary of P|X) and is thus equal to (X|P, 
and hence from (7) Pap (8) 


Thus a permutation is not in general a real dynamical variable, its conjugate 
complex being equal to its reciprocal. 

Any permutation of the numbers 1, 2, 3,..., 7 may be expressed in the cyclic 
notation, e.g. with n = 8 P, = (148) (27) (58) (6), (9) 
in which each number is to be replaced by the succeeding number in a bracket, 
unless it is the last in a bracket, when it is to be replaced by the first in 
that bracket. Thus P, changes the numbers 12345678 into 47138625. The type of 
any permutation is specified by the partition of the number n which is provided by 
the number of numbers in each of the brackets. Thus the type of P, is specified by 
the partition 8 = 3+2+2+1. Permutations of the same type, i.e. corresponding to 
the same partition, we shall call similar. Thus, for example, P, in (9) is similar to 

Py = (871)(35)(46) (2). (10) 
The whole of the n! possible permutations may be divided into sets of similar 
permutations, each such set being called a class. The permutation P,; = 1 forms 
a class by itself. Any permutation is similar to its reciprocal. 
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When two permutations P, and P, are similar, either of them, say P,, may be 
obtained by making a certain permutation P, in the other P,. Thus, in our example 
(9) & (10) we can take P, to be the permutation that changes 14327586 into 
87135462, i.e. the permutation P, = (18623)(475). 

Different ways of writing P, and P, in the cyclic notation would lead to 
different P,’s. Any of these P,’s applied to the product P, |X) would change it into 
P,P, |X), i.e. Pode = ee) 

Hence P= BP Pe (11) 
which expresses the condition for P, and P, to be similar as an algebraic equation. 
The existence of any P, satisfying (11) is sufficient to show that P, and P, 
are similar. 


56. Permutations as constants of the motion 
Any symmetrical function V of the dynamical variables of all the particles 
is unchanged by the application of any permutation P, so P applied to 
the product V |X) affects only the factor |X), thus 
PY SVP LX). 

Hence PV =VP, (12) 
showing that a symmetrical function of the dynamical variables commutes with 
every permutation. The Hamiltonian is a symmetrical function of the dynamical 
variables and thus commutes with every permutation. It follows that each 
permutation is a constant of the motion. This holds even if the Hamiltonian is 
not constant. If |Xt) is any solution of Schrédinger’s equation of motion, P | Xt) 
is another. 

In dealing with any system in quantum mechanics, when we have found 
a constant of the motion a, we know that if for any state of motion, a initially has 
the numerical value a’, then it always has this value, so that we can assign different 
numbers a’ to the different states and so obtain a classification of the states. 
The procedure is not so straightforward, however, when we have several constants 
of the motion a which do not commute (as is the case with our permutations P), 
since we cannot in general assign numerical values for all the a’s simultaneously 
to any state. Let us first take the case of a system whose Hamiltonian does not 
involve the time explicitly. The existence of constants of the motion a which 
do not commute is then a sign that the system is degenerate. This is because, 
for a non-degenerate system, the Hamiltonian H by itself forms a complete set 
of commuting observables and hence, from Theorem 2 of 819, each of the a’s is 
a function of H and therefore commutes with any other a. 

We must now look for a function (6 of the a’s which has one and the same 
numerical value (’ for all those states belonging to one energy-level H’, so that 
we can use ( for classifying the energy-levels of the system. We can express 
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the condition for 6 by saying that it must be a function of H and must therefore 
commute with every dynamical variable that commutes with H, i.e. with every 
constant of the motion. If the a’s are the only constants of the motion, or if 
they are a set that commute with all other independent constants of the motion, 
our problem reduces to finding a function 6 of the a’s which commutes with 
all the a’s. We can then assign a numerical value {’ for 3 to each energy-level 
of the system. If we can find several such functions 6, they must all commute 
with each other, so that we can give them all numerical values simultaneously. 
We obtain thus a classification of the energy-levels. When the Hamiltonian involves 
the time explicitly one cannot talk about energy-levels, but the 6’s will still give 
a useful classification of the states. 

We follow this method in dealing with our permutations P. We must find 
a function y of the P’s such that PyP~' = x for every P. It is evident that 
a possible y is }>P., the sum of all the permutations in a certain class c, 
i.e. the sum of a set of similar permutations, since }> PP. P-' must consist of 
the same permutations summed in a different order. There will be one such xy 
for each class. Further, there can be no other independent y, since an arbitrary 
function of the P’s can be expressed as a linear function of them with numerical 
coefficients, and it will not then commute with every P unless the coefficients of 
similar P’s are always the same. We thus obtain all the y’s that can be used for 
classifying the states. It is convenient to define each x as an average instead of 


a sum, thus Xe = nz! S° Ps 
where n, is the number of P’s in the class c. An alternative expression for x, is 
Fe | a a a ee ee (13) 


P 
the sum being extended over all the n! permutations P, it being easy to verify that 
this sum contains each member of the class c the same number of times. For each 
permutation P there is one y, x(P) say, equal to the average of all permutations 
similar to P. One of the y’s is x(P,) = 1. 

The constants of the motion x1, X2,.--, Ym obtained in this way will each have 
a definite numerical value for every stationary state of the system, in the case 
when the Hamiltonian does not involve the time explicitly, and also in the general 
case can be used for classifying the states, there being one set of states for every 
permissible set of numerical values x/,, \5,..-, X;,, for the x’s. Since the x’s are 
always constants of the motion, these sets of states will be exclusive, i.e. transitions 
will never take place from a state in one set to a state in another. 

The permissible sets of values y’ that one can give to the y’s are limited by 
the fact that there exist algebraic relations between the y’s. The product of any 
two X’S, XpXq 18 of course expressible as a linear function of the P’s, and since 
it commutes with every P it must be expressible as a linear function of the y’s, thus 


XpXq = 1X1 + 2X2 + +++ + OnXm: (14) 
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where the a’s are numbers. Any numerical values y’ that one gives to the y’s 
must be eigenvalues of the x’s and must satisfy these same algebraic equations. 
For every solution ,’ of these equations there is one exclusive set of states. 
One solution is evidently yj, = 1 for every Xp, giving the set of symmetrical states. 
A second obvious solution, giving the set of antisymmetrical states, is yj, = +1, 
the + or — sign being taken according to whether the permutations in the class p 
are even or odd. The other solutions may be worked out in any special case by 
ordinary algebraic methods, as the coefficients a in (14) may be obtained directly 
by a consideration of the types of permutation to which the y’s concerned refer. 
Any solution is, apart from a certain factor, what is called in group theory 
a character of the group of permutations. The y’s are all real dynamical variables, 
since each P and its conjugate complex P~! are similar and will occur added 
together in the definition of any y, so that the y’’s must be all real numbers. 

The number of possible solutions of the equations (14) may easily be 
determined, since it must equal the number of different eigenvalues of an arbitrary 
function B of the x’s. We can express B as a linear function of the y’s with 
the help of equations (14); thus B= bix1 + box2 +--+ + bmXm- (15) 
Similarly, we can express each of the quantities B?, B’,..., B™ as a linear function 
of the y’s. From the m equations thus obtained, together with the equation 
x(P,) = 1, we can eliminate the m unknowns x1, X2,---,; Xm, obtaining as result 
an algebraic equation of degree m for B, B”"+c,B™'+@B" 7 +---+¢m = 0. 
The m solutions of this equation give the m possible eigenvalues for B, each of 
which will, according to (15), be a linear function of by, be, ..., bm whose coefficients 
are a permissible set of values x‘, y5,--., xj,- The sets of values y’ thus obtained 
must be all different, since if there were fewer than m different permissible sets 
of values x’ for the y’s, there would exist a linear function of the y’s every one 
of whose eigenvalues vanishes, which would mean that the linear function itself 
vanishes and the y’s are not linearly independent. Thus the number of permissible 
sets of numerical values for the y’s is just equal to m, which is the number of 
classes of permutations or the number of partitions of n. This number is therefore 
the number of exclusive sets of states. 

All dynamical variables of physical importance and all observable quantities 
are symmetrical between the particles and thus commute with all the P’s. 
Thus the only functions of the P’s of physical importance are the y’s. The states 
corresponding to |x’) and to f(P) |x’), where |y’) is any eigenket of the y’s 
belonging to the eigenvalues y’ and f(P) is any function of the P’s such that 
f(P) |x’) # 0, are observationally indistinguishable and are thus physically 
equivalent. There is a definite number, n(x’) say, of independent kets which can 
be formed by multiplying |x’) by functions of the P’s, which number depends only 
on the x’’s. It is the number of rows and columns in a matrix representation of 
the P’s in which each x is equal to x’. If |x’) corresponds to a stationary state, 
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n(x’) will be its degree of degeneracy (so far as concerns degeneracy caused by 
the symmetry between the particles). This degeneracy cannot be removed by any 
perturbation that is symmetrical between the particles. 


57. Determination of the energy-levels 

Let us apply the perturbation method of 843 and make a first-order calculation 
of the energy-levels in the case when the Hamiltonian does not involve the time 
explicitly. We suppose that for our unperturbed stationary states of the assembly 
each of the similar particles has its own individual state. With n particles, 


we shall have n of these states, corresponding to kets |a'), |a?),..., ja”) say, 
which we assume for the present to be all orthogonal. The ket for the assembly 
is then |X) = Jaz) |a3)... an), (16) 
like (1) with a‘, a2... instead of a, b,.... If we apply any permutation P to it 
we get another ket PX y=la,) lag). ae) (17) 
say, Tr, S,..., 2 being some permutation of the numbers 1, 2,..., , corresponding 


to another stationary state of the assembly with the same energy. There are thus 
altogether n! unperturbed states with this energy, if we assume there are no other 
causes of degeneracy. According to the method of §43 when the unperturbed 
system is degenerate, we must consider those elements of the matrix representing 
the perturbing energy V that refer to two states with the same energy, i.e. those 
of the type (X|P,VP,|X). These will form a matrix with n! rows and columns, 
whose eigenvalues are the first-order corrections in the energy-levels. 

We must now introduce another kind of permutation operator which can be 
applied to kets of the form (17), namely a permutation which acts on the indices 
of the a’s. We denote such a permutation operator by P® The essential difference 
between the P’s and the P®’s may be seen in the following way. Let us consider 
a permutation in the general sense, say that consisting of the interchange of 2 
and 3. This may be interpreted either as the interchange of the objects 2 and 3 
or as the interchange of the objects in the places 2 and 3, these two operations 
producing in general quite different results. The first of these interpretations 
is the one that gives the operators P, the objects concerned being the similar 
particles. A permutation P can be applied to an arbitrary ket for the assembly. 
A permutation with the second interpretation has a meaning, however, only when 
applied to a ket of the form (17), for which each of the particles is in a ‘place’ 
specified by an a, or to a sum of kets of the form (17). A permutation P may 
be considered as an ordinary dynamical variable. A permutation P® may be 
considered as a dynamical variable in a restricted sense, valid when one is dealing 
only with states obtainable by superposition of the various states (17). This is 
the case for our present perturbation problem. 

We can form algebraic functions of the P® which will be other operators 
applicable to kets of the form (17). In particular we can form x(P*), the average of 
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all P®’s in a certain class c. This must equal x(P.), the average of the permutation 

operators P in the same class, since the total set of all permutations in a given class 

must evidently be the same whether the permutations are applied to the particles 
or to the places the particles are in. Any P commutes with any P® i.e. 

PPS = PEP. (18) 

By labelling the a’s by the same numbers 1, 2, 3,..., which label the particles, 

we set up a one-one correspondence between the a’s and the particles, so that given 

any permutation FP, applying to the particles, we can give a meaning to the same 

permutation P° applying to the a’s. This meaning is such that, for the ket |X) 


given by (16), P&P, |X) =X). (19) 
Since the various kets |a'), |a?),... are orthogonal, |X) and P|X) are orthogonal 
unless P = 1. It follows that, for any coefficients cp, 
S¢ cp (X| P*P, |X) = cp,, (20) 
P 
provided |X) is normalized, the summation being over all the n! permutations P 
or P®, with P, fixed. Now define Vp by Vp = (X|VP|X). (21) 


We then have, for any two permutations P, and P,, 
(X| PV PB, |X) = (X|V PePy|X) = Ve.p, 


= S_ Vp (X| P*P, Py |X) 


P 
with the help of (20). From (18) this gives 
(X| PV Py |XySS" VetX| PPR P, |X). (22) 
P 
We may write this result as Ve Ly VpP% (23) 
P 


where the © sign means an equation in a restricted sense, the operators on the two 
sides being equal so long as they are used only with kets of the form P|X) and 
their conjugate imaginary bras. 

The formula (23) shows that the perturbing energy V is equal, 
in the restricted sense, to a linear function of the permutation operators P® with 
coefficients Vp given by (21). The restricted sense is adequate for the calculation 
of the first-order correction in the energy-levels, as this calculation involves only 
those matrix elements of V given by (22). The formula (23) is a very convenient 
one because the expression on its right-hand side is easily handled. 

As an example of an application of (23) we shall determine the average 
energy of all those states, arising from the unperturbed state (16), that belong 
to one exclusive set. This requires us to calculate the average eigenvalue of 
V for those states (17) for which the x’s have specified numerical values y’. 
Now the average eigenvalue of P® for any of these states equals that of P*P°(P®)~! 
for arbitrary P® and thus equals that of n!~'S*,. P°P®(P*)~! which is y/(P°) 
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or y'(P,). Hence the average eigenvalue of V is }),Vpx'(P). A similar method 
could be used for calculating the average eigenvalue of any function of V, it being 
necessary only to replace each P® by y’(P) to perform the averaging. 

The number of energy-levels in an exclusive set y =x’ that arise from 
a given state of the unperturbed system is equal to the number of eigenvalues 
of the right-hand side of (23) that are consistent with the equations x = y’. 
This number is the number n(y’) introduced at the end of the preceding section, 
and is thus just the degree of degeneracy of the states in this set. 

We have assumed that the individual kets ja‘), |a?),... which determine 
the unperturbed state according to (16) are all orthogonal. The theory can 
easily be extended to the case when some of these kets are equal, any two 
that are not equal being still restricted to be orthogonal. We now have 
some permutations P® such that P*|X)= |X), namely those permutations 
which involve only interchanges of equal a’s. Equation (20) will now hold 
if the summation is extended only over those P’s which make P° |X) different. 
With this change in the meaning of 5), all the previous equations still hold, 
including the result (23). For the present |X) there will be restrictions on 
the possible numerical values of the y’s, e.g. they cannot have those values 
corresponding to |X) being antisymmetrical. 


58. Application to electrons 

Let us consider the case when the similar particles are electrons. This requires, 
according to Pauli’s exclusion principle discussed in §54, that we take into account 
only the antisymmetrical states. It is now necessary to make explicit reference 
to the fact that electrons have spins, which show themselves through an angular 
momentum and a magnetic moment. The effect of the spin on the motion of 
an electron in an electromagnetic field is not very great. There are additional 
forces on the electron due to its magnetic moment, requiring additional terms in 
the Hamiltonian. The spin angular momentum does not have any direct action 
on the motion, but it comes into play when there are forces tending to rotate 
the magnetic moment, since the magnetic moment and angular momentum are 
constrained to be always in the same direction. In the absence of a strong magnetic 
field these effects are all small, of the same order of magnitude as the corrections 
required by relativistic mechanics, and there would be no point in taking them 
into account in a non-relativistic theory. The importance of the spin lies not in 
these small effects on the motion of the electron, but in the fact that it gives two 
internal states to the electron, corresponding to the two possible values of the spin 
component in any assigned direction, which causes a doubling in the number of 
independent states of an electron. This fact has far-reaching consequences when 
combined with Pauli’s exclusion principle. 


orbital variable 
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In dealing with an assembly of electrons we have two kinds of dynamical 
variables. The first kind, which we may call the orbital variables, consists 
of the coordinates x, y, z of all the electrons and their conjugate momenta 
Px; Py, Pz» The second kind consists of the spin variables, the variables oz, oy, oz, 
as introduced in §37, for all the electrons. These two kinds of variables belong 
to different degrees of freedom. According to §§20 and 21, a ket fixing the state 
of the whole system may be of the form |A) |B), where |A) is a ket referring to 
the orbital variables alone and |B) is a ket referring to the spin variables alone, 
and the general ket fixing a state of the whole system is a sum or integral of kets 
of this form. This way of looking at things enables us to introduce two kinds of 
permutation operators, the first kind, P” say, applying to the orbital variables only 
and operating only on the factor |A) and the second kind, P” say, applying only 
to the spin variables and operating only on the factor |B). The P*’s and P?’s can 
each be applied to any ket for the whole system, not merely to certain special kets, 
like the P°’s of the preceding section. The permutations P that we have had up to 
the present apply to all the dynamical variables of the particles concerned, so for 
electrons they will apply to both the orbital and the spin variables. This means 
that each P, equals the product P, = P/P’. (24) 


We can now see the need for taking the spin variables into account when 
applying Pauli’s exclusion principle, even if we neglect the spin forces in 
the Hamiltonian. For any state occurring in nature each P, must have the value +1, 
according to whether it is an even or an odd permutation, so from (24) 

P® Po = 41. (25) 

The theory of the three preceding sections would become trivial if applied 
directly to electrons, for which each P, = +1. We may, however, apply it to the P*” 
permutations of electrons. The P?’s are constants of the motion if we neglect 
the terms in the Hamiltonian that arise from the spin forces, since this neglect 
results in the Hamiltonian not involving the spin dynamical variables o at all. 
The P*’s must then also be constants of the motion. We can now introduce 
new x’s, equal to the average of all of the P*’s in each class, and assert that for 
any permissible set of numerical values x’ for these .’s there will be one exclusive 
set of states. Thus there exist exclusive sets of states for systems containing 
many electrons even when we restrict ourselves to a consideration of only those 
states that satisfy Pauli’s principle. The exclusiveness of the sets of states is now, 
of course, only approximate, since the x’s are constants only so long as we neglect 
the spin forces. There will actually be a small probability for a transition from 
a state in one set to a state in another. 

Equation (25) gives us a simple connexion between the P*’s and P?’s, 
which means that instead of studying the dynamical variables P” we can get 
all the results we want, e.g. the characters y‘, by studying the dynamical 
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variables P’. The P?’s are much easier to study on account of there being only two 
independent states of spin for each electron. This fact results in there being fewer 
characters y’ for the group of permutations of the o-variables than for the group 
of general permutations, since it prevents a ket in the spin variables from being 
antisymmetrical in more than two of them. 

The study of the P®’s is made specially easy by the fact that we can express 
them as algebraic functions of the dynamical variables o. Consider the quantity 
Org = 5{1 + 021022 + OyiOy2 + On022} = 3{1+ (01, 02)}. 

With the help of equations (50) and (51) of §37 we find readily that 


(O14, O2)° = (Ox10 22 + Cyl Fy2 + 021022)" =3- 2(o1, C2), (26) 
and hence that O}, = +{1 + 2(o1, o2) + (o1, o2)"} = (27) 
Again, we find O120 x1 = 4{on1 + O72 — 10 210 y2 + id y1022}, 


was 1 . . 
072012 = 3{Ox2 + Oz1 + 1 y1 O22 — 10102} 
and hence Oj20r1 = O720}12. 


Similar relations hold for oy; and o.; so that we have 
O20 = G20 12 
or O~nri0:s: = 092. 
From this we can obtain with the help of (27) 
On050 5 SOR 
These commutation relations for Oj. with o,; and o» are precisely the same 
as those for PZ, the permutation consisting of the interchange of the spin 
variables of electrons 1 and 2. Thus we can put O,, = cPy,, where c is 
a number. Equation (27) shows that c = +1. To determine which of 
these values for c is the correct one, we observe that the eigenvalues of Pf, 
are 1, 1,1, —1, corresponding to the fact that there exist three independent 
symmetrical and one antisymmetrical state in the spin variables of two electrons, 
namely, with the notation of §37, the states represented by the three symmetrical 
functions fa(o!,) fa(o!s), fa(o%).fa(a!s), fa (ols) fa(o%) + fa(o!) fa(alg) and the one 
antisymmetrical function fo(o!,)fe(al,) — fa(ol,)falol). Thus the mean of 
the eigenvalues of P?, is 4. Now the mean of the eigenvalues of (01, 2) is evidently 
zero and hence the mean of the eigenvalues of Oj, is 4. Thus we must have c = +1, 
and so we can put Pe, = 3{1 + (01, o2)}. (28) 
In this way any permutation P°’ consisting simply of an interchange can be 
expressed as an algebraic function of the o’s. Any other permutation P? can 
be expressed as a product of interchanges and can therefore also be expressed 
as a function of the o’s. With the help of (25) we can now express the P*’s as 
algebraic functions of the o’s and eliminate the P’’s from the discussion. We have, 
since the — sign must be taken in (25) when the permutations are interchanges 
and since the square of an interchange is unity, 


multiplet 
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Ply = —3{1 + (01, 02)}- (29) 

The formula (29) may conveniently be used for the evaluation of 

the characters \’ which define the exclusive sets of states. We have, for example, 
for the permutations consisting of interchanges, 


x12 = xP) = 5 Ta mn 1 Slower} 


If we introduce the dynamical variable s to describe the magnitude of the total 
spin angular momentum, 4 )-,.¢, in units of h, through the formula 


s(s+1)= (Y-. ‘De, 


in agreement with (39) of §36, we have 


2S °(e,,04) = » On >a) — S (or, Or) 


‘a = 4s(s + 1) —3n. : 
ol _4s(s+1)—3n| n(n — 4) + 48(s +1) 
Hence ee me {1 a \ =— In(n = 1) (30) 


Thus x12 is expressible as a function of the dynamical variable s and of n 
the number of electrons. Any of the other yx’s could be evaluated on similar 
lines and would have to be a function of s and n only, since there are no other 
symmetrical functions of all the o dynamical variables which could be involved. 
There is therefore one set of numerical values ,’ for the y’s, and thus one exclusive 
set of states, for each eigenvalue s’ of s. The eigenvalues of s are 

tn, 4n—1, 4n-2, ..., 
the sequence* terminating with 0 or 4. 

We see in this way that each of the stationary states of a system with several 
electrons is an eigenstate of s, the magnitude in units of h of the total spin angular 
momentum 5 >> o,, belonging to a definite eigenvalue s’. For any given s’ there will 
be 2s’ +1 possible values for a component of the total spin vector in any direction 
and these will correspond to 2s’ + 1 independent stationary states with the same 
energy. When we do not neglect the forces due to the spin magnetic moments these 
2s’ +1 states will in general be split up into 2s’ +1 states with slightly different 
energies, and will thus form a multiplet of multiplicity 2s’+1. Transitions in which 
s’ changes, i.e. transitions from one multiplicity to another, cannot occur when 
the spin forces are neglected and will have only a small probability of occurrence 
when the spin forces are not neglected. 

We can determine the energy-levels of a system with several electrons to the first 
approximation by applying the theory of the preceding section with the kets |a”) 


*l‘sequence’ is used for ‘series’| 
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referring only to the orbital variables and using formula (23). If we consider 
only the Coulomb forces between the electrons, then the interaction energy V will 
consist of a sum of parts each referring to only two electrons, which will result in 
all the matrix elements Vp vanishing except those for which P” is the identical 
permutation or is simply an interchange of two electrons. Thus (23) will reduce to 

VeVt SVP, (31) 

r<s 

an equation? in the restricted sense of (23), V,, being the matrix element 
referring to the interchange of electrons r and s. Since the P®’s have the same 
properties as the P*’s, any function of the P°’s will have the same eigenvalues 
as the corresponding function of the P*’s, so that the right-hand side of (31) will 


have the same eigenvalues as Vi + oe Vie 


rs) 
r<s 


or Vi-4 5) —Vps{1 + (o,,05)} (32) 
r<s 

from (29). The eigenvalues of (32) will give the first-order corrections in 
the energy-levels. The form of (32) shows that a model which assumes a coupling 
energy between the spins of the various electrons, of magnitude —4V,..(0;, 0.5) for 
the electrons in the r and s orbital states, would meet with a fair amount of success. 
This coupling energy is much greater than that of the spin magnetic moments. 
Such models of the atom were in use before the justification by quantum mechanics 
was obtained. 

We may have two of the orbital states of the unperturbed system the same, 
i.e. the kets |a”) in the orbital variables for two electrons may be the same. 
Suppose |a‘) and Ja?) are the same. Then we must take only those eigenvalues 
of (31) that are consistent with Pf, = 1, or those eigenvalues of (32) that are 
consistent with P%, = 1 or PZ, = —1. From (28) this condition gives (0,02) = —3, 
so that (0, + 02)? = 0. Thus the resultant of the two spins 0; and oy, is zero, 
which may be interpreted as the spins 0; and o2 being antiparallel. Thus we may 
say that two electrons in the same orbital state have their spins antiparallel. 
More than two electrons cannot be in the same orbital state. 


[Explanation of the equation meaning added.] 
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principle 
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59. An assembly of bosons 
WE consider a dynamical system composed of u’ similar particles. We set up 
a representation for one of the particles with discrete basic kets Ja), Ja), 
jane yok Then, as explained in 854, we get a symmetrical representation of 
the assembly of u’ particles by taking as basic kets the products 

lat) lee) leg) vs lat = latebes. cae.) (1) 
in which there is one factor for each particle, the suffixes 1, 2, 3,..., u’ of the a’s 
being the labels of the particles and the indices a, b, c,..., g denoting indices 
(1) (2) (@) in the basic kets for one particle. If the particles are bosons, 
so that only symmetrical states occur in nature, then we need to work with 
only the symmetrical kets that can be constructed from the kets (1). The states 
corresponding to these symmetrical kets will form a complete set of states for 
the assembly of bosons. We can build up a theory of them as follows. 


We introduce the linear operator S defined by § = u/!-? S- P. (2) 


the sum being taken over all the u’! permutations of the wu’ particles. Then S 
applied to any ket for the assembly gives a symmetrical ket. We may therefore call 
S the symmetrizing operator. From (8) of §55 it is real. Applied to the ket (1) 
it gives ull-3 > P lataga§...a%,) = $|a%a’a®...a*), (3) 
the labels of the particles being omitted on the right-hand side as they are no longer 
relevant. The ket (3) corresponds to a state for the assembly of u’ bosons with 
a definite distribution of the bosons among the various boson states, without any 
particular boson being assigned to any particular state. The distribution of 
bosons is specified if we specify how many bosons are in each boson state. 


Let n,n, 4,... be the numbers of bosons in the states a, a®), a)... 
respectively with this distribution. The n’’s are defined algebraically by 
the equation a tab taot---+09 =nia + nha? + nga +---. (4) 


The sum of the n’’s is of course u’. The number of n’’s is equal to the number of 
basic kets la”), which in most applications of the theory is very much greater 
than u’, so most of the n’’s will be zero. If a% a’, a%..., a% are all different, 
ie. if the n’’s are all 0 or 1, the ket (3) is normalized, since in this case the terms 


on the left-hand side of (3) are all orthogonal to one another and each contributes 
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ul“! to the squared length of the ket. However, if a% a’, a%..., a are not all 


different, those terms on the left-hand side of (3) will be equal which arise from 
permutations P which merely interchange bosons in the same state. The number 
of equal terms will be n{!nj!n4!..., so the squared length of the ket (3) will be 
(atarar...a9| S? |a®a’a®. a9) = nil ng ng! (5) 

For dealing with a general state of the assembly we can introduce the numbers 
Ny, No, N3,... of bosons in the states a, a), a)... respectively and treat the n’s 
as dynamical variables or as observables. They have the eigenvalues 0, 1, 2,..., wu’. 
The ket (3) is a simultaneous eigenket of all the n’s, belonging to the eigenvalues 
n',, nb, ny,.... The various kets (3) form a complete set for the dynamical system 
consisting of u’ bosons, so the n’s all commute (see the converse to the theorem 
of §13). Further, there is only one independent ket (3) belonging to any set of 
eigenvalues nj, nj, n3,.... Hence the n’s form a complete set of commuting 
observables. If we normalize the kets (3) and then label the resulting kets by 
the eigenvalues of the n’s to which they belong, i.e. if we put 

(il rigl ng! cs) 28 lat ara 0°) Ini rin, os); (6) 

we get a set of kets |nnjn,...), with the n’’s taking on all non-negative integral 
values adding up to wu’, which kets will form the basic kets of a representation with 
the n’s diagonal. 

The n’s can be expressed as functions of the observables a1, a2, @3,..., Qy’ 
which define the basic kets of the individual bosons by means of the equations 


Na = S- Oa,ot' (7) 


or the equations S/ raf (a*) = S- f (ar) (8) 


holding for any function f. 

Let us now suppose that the number of bosons in the assembly is not given, 
but is variable. This number is then a dynamical variable or observable uw, 
with eigenvalues 0, 1, 2,..., and the ket (3) is an eigenket of u belonging to 
the eigenvalue u’. To get a complete set of kets for our dynamical system we must 
now take all the symmetrical kets (3) for all values of u’ We may arrange them in 
order thus }, la), § Ja%a’), S Jata’a’), sueks (9) 
where first is written the ket, with no label, corresponding to the state with no 
bosons present, then come the kets corresponding to states with one boson present, 
then those corresponding to states with two bosons, and so on. A general state 
corresponds to a ket which is a sum of the various kets (9). The kets (9) are 
all orthogonal to one another, two kets referring to the same number of bosons 
being orthogonal as before, and two referring to different numbers of bosons 
being orthogonal since they are eigenkets of u belonging to different eigenvalues. 
By normalizing all the kets (9), we get a set of kets like (6) with no restriction on 


oscillator 
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the n’’s (i.e. each n’ taking on all non-negative integral values) and these kets form 
the basic kets of a representation with the n’s diagonal for the dynamical system 
consisting of a variable number of bosons. 

If there is no interaction between the bosons and if the basic kets 
Ja), |a)),... correspond to stationary states of a boson, the kets (9) will 
correspond to stationary states for the assembly of bosons. The number u of bosons 
is now constant in time, but it need not be a specified number, i.e. the general 
state is a superposition of states with various values for u. If the energy of one 
boson is H(a), the energy of the assembly will be 


S\° H(a,) = 5) nH" (10) 


from (8), H* being short for the number H(a*). This gives the Hamiltonian for 
the assembly as a function of the dynamical variables n. 


60. The connexion between bosons and oscillators 

In §34 we studied the harmonic oscillator, a dynamical system of one degree of 
freedom describable in terms of a canonical gq and p, such that the Hamiltonian 
is a sum of squares of q and p, with numerical coefficients. We define a general 
oscillator mathematically as a system of one degree of freedom describable in 
terms of a canonical g and p, such that the Hamiltonian is a power series in q 
and p, and remains so if the system is perturbed in any way. We shall now study 
a dynamical system composed of several of these oscillators. We can describe 
each oscillator in terms of, instead of g and p, a complex dynamical variable n, 
like the 7 of 834, and its conjugate complex 77, satisfying the commutation relation 
(7) of §34. We attach labels 1, 2, 3,... to the different oscillators, so that the whole 
set of oscillators is describable in terms of the dynamical variables 71, 2, 73, ..., 


71; No, N3,--- Satisfying the commutation relations 
Nate — Nba = 9, 
Tay — Meta = 9; (11) 
Male — NeNa = ab 
Put NaNa = Nas (12) 
so that NaN = Natl. (13) 


The n’s are observables which commute with one another and the work of §34 
shows that each of them has as eigenvalues all non-negative integers. For the ath 
oscillator there is a standard ket, |0,) say, which is a normalized eigenket of nq 
belonging to the eigenvalue zero. By multiplying all these standard kets together 
we get a standard ket for the set of oscillators, 


|01) Oz) |O3) «-- = ]010203...), (14) 
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which is a simultaneous eigenket of all the n’s belonging to the eigenvalues zero. 
The standard ket (14) will be much used in the future and will be denoted simply 


by* ),. From (13) of §34 Tha) g = 0 (15) 
for any a. The work of §34 also shows that, ifn}, 25, ns,... are any non-negative 
integers, ete Ue ake va (16) 


is a simultaneous eigenket of all the n’s belonging to the eigenvalues n{, n, n5,... 
respectively. The various kets (16) obtained by taking different n’’s form a complete 
set of kets all orthogonal to one another and the square of the length of one 
of them is, from (16) of §34, nj!nj!n!.... From this we see, bearing in mind 
the result (5), that the kets (16) have just the same properties as the kets (9), 
so that we can equate each ket (16) to the ket (9) referring to the same n’ values 
without getting any inconsistency. This involves putting 

Sata a® G2) = Matte ill) e (17) 
The standard ket ), becomes equal to the first of the kets (9), corresponding to 
no bosons present. 

The effect of equation (17) is to identify the states of an assembly of bosons with 
the states of a set of oscillators. This means that the dynamical system consisting 
of an assembly of similar bosons is equivalent to the dynamical system consisting 
of a set of oscillators—the two systems are just the same system looked at from two 
different points of view. There is one oscillator associated with each independent 
boson state. We have here one of the most fundamental results of quantum 
mechanics, which enables a unification of the wave and corpuscular theories of 
light to be effected. 

Our work in the preceding section was built up on a discrete set of basic kets 
|a*) for a boson. We could pass to a different discrete set of basic kets, |24) say, 
and build up a similar theory on them. The basic kets for the assembly would 
then be, instead of (9), Me POs SOTO iG hs BOB Orie ts. coat (18) 
The first of the kets (18), referring to no bosons present, is the same as the first 
of the kets (9). Those kets (18) referring to one boson present are linear functions 
of those kets (9) referring to one boson present, namely 


Ja“) = dla") (a | 6’), (19) 
and generally those kets (18) referring to u’ bosons present are linear functions of 


those kets (9) referring to u’ bosons present. Associated with the new basic states 
|B“) for a boson there will be a new set of oscillator variables 74, and corresponding 


to (17) we shall have 5 |64878°...) =nansnc...)g- (20) 
Thus a ket nang...) with u’ factors 74, 7B,... must be a linear function of kets 
NaNv--.)g With u’ factors a, m,.... It follows that each linear operator 74 must 


be a linear function of the 7s. Equation (19) gives 


*[)g is replaced by |0) in the 4th edition.| 
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14) g = > Ma) 5 (a* | 8”) 
and hence 1A = Sig (a* | B4). (21) 


Thus the »’s transform according to the same law as the basic kets for 
a boson. The transformed 7’s satisfy, with their conjugate complexes, the same 
commutation relations (11) as the original ones. The transformed 7’s are on 
just the same footing as the original ones and hence, when we look upon our 
dynamical system as a set of oscillators, the different degrees of freedom have no 
invariant significance. 

The 7’s transform according to the same law as the basic bras for a boson, 
and thus the same law as the numbers (a* |x) forming the representative of 
a state x. This similarity people often describe by saying that the 7,’s are given 
by a process of second quantization applied to (a® |x), meaning thereby that, 
after one has set up a quantum theory for a single particle and so introduced 
the numbers (a* | x) representing a state of the particle, one can make these 
numbers into linear operators satisfying with their conjugate complexes the correct 
commutation relations, like (11), and one then has the appropriate mathematical 
basis for dealing with an assembly of the particles, provided they are bosons. 
There is a corresponding procedure for fermions, which will be given in 865. 

Since an assembly of bosons is the same as a set of oscillators, it must be 
possible to express any symmetrical function of the boson variables in terms of 
the oscillator variables 7 and 7. An example of this is provided by equation (10) 
with 7,7, substituted for n,. Let us see how it goes in general. Take first the case 
of a function of the boson variables of the form 


Ur=)_U,, (22) 


where each U,. is a function only of the dynamical variables of the rth boson, so that 
it has a representative (a2| U,. |a®) referring to the basic kets |a®) of the rth boson. 
In order that Ur may be symmetrical, this representative must be the same for 
all r, so that it can depend only on the two eigenvalues labelled by a and b. We may 
therefore write it (a? U, |ar) = a U |a”) = (al U |b) (23) 


for brevity. We have Un latte? «! = Dale ey ...) (a|U|x,). (24) 


Summing this equation for all values of r and sophene the symmetrizing operator S 
to both sides, we get 


SUr |o™ aS? ... = Os a Sla®a®?...a%...) (alU |a,). (25) 


Since Ur is symmetrical we can nae SUr by UrS and can then substitute for 
the symmetrical kets in (25) their values given by (17). We get in this way 
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Urns, Nx. nc apa Lone Nay Nao ++ dg (a| U |Z, 
= noe Te Ney +++ )s Oba, (a| U |b), (26) 


n,,' meaning that the factor 7, must be cancelled out. Now from (15) and 
the commutation relations me 


Tie: = 2M ne ae (27) 

(note that 7, is like the operator of partial differentiation 0/On), so (26) becomes 

Ure Neo --+)g = SS aMoMes Neg .-+)g a] U |b). (28) 

Whe kets jn; a20 gs torm, a emilee set, and hence we can infer from (28) 

the operator equation Ur = S- Na (a| U |b) 7). (29) 
a,b 


This gives us Ur in terms of the 7 and 7 variables and the matrix elements (a| U |b). 
Now let us take a symmetrical function of the boson variables consisting of 


a sum of terms each referring to two bosons, Vp = S- Vs: (30) 
r, sAr 
We do not need to assume V,, = V,,. Corresponding to (23), V,, has matrix 
elements (a%a®| Vs asa?) = (ab| V |cd) (31) 
for brevity. Proceeding as before we get, ila cia to (25), 
SVr |ajias? ...) =e S "8 |aftag Dene mye) (ab) Vleet.) (32) 
r,sAr a,b 


and corresponding to (26) 
Vette es --)¢ = >, Mate >, Te: Tha, Tey Mey---)g Sex, Sax, (ab|V led). (33) 


a,b,c, d T, SAT 
We can deduce as an extension of (27) 


Une a= ate Unesco (34) 
so that (33) becomes meer 
VrteyTe ---)g = >. NaMTMcMaNe, Ney ---)g (abl V led), 
a,b,c,d 
giving us the operator equation 
Vr = S- Nao (ab| V |cd) Na: (35) 
a,b,c,d 
The method can readily be extended to give any symmetrical function of the boson 
variables in terms of the 7’s and 7’s. 
The foregoing theory can easily be generalized to apply to an assembly of 
bosons in interaction with some other dynamical system, which we shall call 
for definiteness the atom. We must introduce a set of basic kets, |¢’) say, 
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for the atom alone. We can then get a set of basic kets for the whole system 
of atom and bosons together by multiplying each of the kets |¢’) into each of 
the kets (9). We may write these kets 

le. eas. SS |c’a%a”), & \C’a%a*a’), weet *(36) 
We may look upon the system as composed of the atom in interaction with a set 
of oscillators, so that it can be described in terms of the atom variables and 
the oscillator variables ,, 7,. Using again the standard ket ), for the set of 
oscillators, we have —S'|¢‘a*a’a®...) = nanone--- ) |6'), (37) 
corresponding to (17), as the equation expressing the basic kets (36) in terms of 
the oscillator variables. 

Any function of the atom variables and boson variables which is symmetrical 
between all the bosons is expressible as a function of the atom variables and 
the 7’s and 7’s. Consider first a function Ur of the form (22) with U, a function 
only of the atom variables and the variables of the rth boson, so that it has 
a representative (¢’a%|U, |¢”a?). This representative must be independent of r 
in order that Ur may be symmetrical between all the bosons, so we may write 
it (C/a*| U \c”a®). Now let us define (a|U |b) to be that function of the atom 
variables whose representative is (¢’a*|U |¢’a”), so that we have* 

(Clat] Ur |Car) = (Ca®| U |c"a?) = (¢'] Ca] U |b) |0"), (38) 
corresponding to (23). The equations (24)-(28) can now be taken over and applied 
to the present work if both sides of all these equations are multiplied by |¢’) on 
the right, with the result that formula (29) still holds. We can deal similarly with 
a symmetrical function Vy of the form (30) with V,., a function only of the atom 
variables and the variables of the rth and sth bosons. Defining (ab| V |cd) to be 
that function of the atom variables whose representative is 

(cata!| Ves |Ctaza’), 
we find that formula (35) still holds. 


61. Emission and absorption of bosons 

Let us suppose that the oscillators of the preceding section are harmonic oscillators 
and there is no interaction between them. The energy of the ath oscillator is then, 
from (5) of §34, Ha = hw nN, + Shwe. 

We shall neglect the constant term 4hw,, which is the energy of the oscillator in 
its lowest state—the so-called ‘zero-point energy. This neglect does not have any 
dynamical consequences, as explained at the beginning of §30, and merely involves 
a redefinition of H,. The total energy of all the oscillators is now 


Ar = > He Ss" hw Nae = ye fiwana (39) 


*[‘a@’ in the superscripts of a instead of another a| 
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with the help of (12). This is of the same form as (10), with hw, for H*. Thus a set 
of harmonic oscillators is equivalent to an assembly of bosons in stationary states 
with no interaction between them. If an oscillator of the set is in its n'th quantum 
state, there are n' bosons in the associated boson state. 

In general the Hamiltonian for the set of oscillators will be a power series in 
the variables 1q, 7),,; Say 

Hy = Hp +) —(U,ta + Tita) +) _(Gastatn + Vistas + VesTiaTs) +++, (40) 

a a,b 

where Hp, U,, Uy, Vy are numbers, Hp being real and U,, = Usa. If the set 
of oscillators are in interaction with an atom, as we had at the end of 
the preceding section, the total Hamiltonian will still be of the form (40), with Hp, 
U,; Ua», Van functions of the atom variables, Hp in particular being the Hamiltonian 
for the atom by itself. A general treatment of this dynamical system would be 
rather complicated and for practical applications one assumes that the terms 


Hp +) UsalaMa (41) 


are large compared with the others and form by themselves an unperturbed system, 
the remaining terms being taken into account as a perturbation producing 
transitions in the unperturbed system, according to the theory of §44. 
If, further, U,, is independent of the atom variables, the unperturbed system 
with Hamiltonian (41) consists merely of an atom with Hamiltonian Hp and 
an assembly of bosons in stationary states with Hamiltonian of the form (39), 
with no interaction. 

Let us consider what kinds of transitions are produced by the various 
perturbation terms in (40). Take a stationary state of the unperturbed system 
for which the atom is in a stationary state, ¢’ say, and bosons are present in 
the stationary boson states, a, b, c,.... This stationary state for the unperturbed 
system corresponds to the ket — mampte.-- ) ¢|C’), (42) 
like (37). If the term U,7, of (40) is multiplied into this ket, the result is a linear 
combination of kets like RetigNiticas« yelG )s (43) 
¢” denoting any stationary state of the atom. The ket (43) refers to one more boson 
than the ket (42), the extra boson being in the state x. Thus the perturbation 
term U,, gives rise to transitions in which one boson is emitted into state 7 and 
the atom makes an arbitrary jump. If the term U,,77, of (40) is multiplied into (42), 
the result is zero unless (42) contains a factor 7, and is then a linear combination 
of kets like ss Waeinies Valle) 
referring to one boson less in state x. Thus the perturbation term U, 7), gives rise 
to transitions in which one boson is absorbed from state x, the atom again making 
an arbitrary jump. Similarly, we find that a perturbation term Uny 2.7), (« # y) 
gives rise to processes in which a boson is absorbed from state y and one is emitted 
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into state x, or, what is the same thing physically, one boson makes a transition 
from state y to state x. This kind of process would be produced by a term 
like the Up of (22) and (29) in the perturbation energy, provided the diagonal 
elements (a| U |a) vanish. Again, the perturbation terms Vzynety, VeyT2M, give rise 
to processes in which two bosons are emitted or absorbed, and so on for more 
complicated terms. With any of these emission and absorption processes the atom 
can make an arbitrary jump. 

Let us determine how the probability of occurrence of each of these transition 
processes depends on the numbers of bosons originally present in the various 
boson states. From §§44 & 46 the transition probability is always proportional 
to the square of the modulus of the matrix element of the perturbation energy 
referring to the two states concerned. Thus the probability of a boson being 
emitted into state x with the atom making a jump from state C46 state ¢" is 
proportional to. |(¢"| (nin... (nl, +1)... |Uane Inn..nl,...)ICYP (44) 
the n’’s being the numbers of bosons initially present in ie various boson states. 
Now from (6) and (17), with reference to (4), 


Rasta ce (Nie! ns! oe \ ah nee see) Sy (45) 
so that Vip Mei Mirgaes a Meat) = (Mea 1) Neece (te Lea) (46) 
Hence (44) is equal to (nl, + 1)|(C"| Ue |C’ (7, (47 


showing that the probability of a transition in which a boson is emitted into state x 
is proportional to the number of bosons originally in state x plus one. 

The probability of a boson being absorbed from state x with the atom making 
a jump from state ¢’ to a c is proportional to 

1(¢”| (ning... (n, — 1) aU. Uz Ne |ning... a) I<’ |" (48) 
the n’s again being the numbers of bosons ‘cial aiesent in the various 
boson states. Now from (45)* 7, |nin,...ni,...) =n 3 ni ni, ...(mi,-1)...), (49) 
so (48) is equal to! nt, |(C"| Ur ele (50) 
Thus the probability of a transition in which a boson is absorbed from state x is 
proportional to the number of bosons originally in state x. 

Similar methods may be applied to more complicated processes, and show that 
the probability of a process in which a boson makes a transition from state y to 
state x (x # y) is proportional to nj,(n/, + 1). More generally, the probability 
of a process in which bosons are oes from states x, y,... and emitted into 
states a, b,... is proportional to nin,...(n, + 1)(n, + 1).. (51) 
the n’’s being i in each case the numbers of bosons originally present. These results 
hold both for direct transition processes and transition processes that take place 
through one or more intermediate states, in accordance with the interpretation 
given at the end of §44. 

*lIf (46) is considered instead then the affix S may be omitted and the equation takes the 


form of the equivalent one in the 4th edition.| 
t[The second ¢ is corrected to one prime.| 


Loe 


62. Application to photons 

Since photons are bosons, the foregoing theory can be applied to them. A photon 
is in a stationary state when it is in an eigenstate of momentum. It then has two 
independent states of polarization, which may be taken to be two perpendicular 
states of linear polarization. The dynamical variables needed to describe 
the stationary states are then the momentum p, a vector, and a polarization 
variable 1, consisting of a unit vector perpendicular to p. The variables p and 1 
take the place of our previous a’s. The eigenvalues of p consist of all numbers 
from —oo to oo for each of the three Cartesian components of p, while for each 
eigenvalue p’ of p, 1 has just two eigenvalues, namely two arbitrarily chosen vectors 
perpendicular to p’ and to one another. Owing to the eigenvalues of p forming 
a continuous range, there are a continuous range of stationary states, giving us 
the continuous basic kets |p’l'). However, the foregoing theory was built up in 
terms of discrete basic kets |a’) for a boson. There are two formalisms which one 
may use for getting over this discrepancy. 

The first consists in replacing the continuous three-dimensional distribution 
of eigenvalues for p by a large number of discrete points lying very close 
together, forming a dust spread over the whole three-dimensional p-space. 
Let sp be the density of the dust (the number of points per unit volume) in 
the neighbourhood of any point p’ Then s, must be large and positive, but is 
otherwise an arbitrary function of p’ An integral over the p-space may be replaced 
by a sum over the dust of points, in accordance with the formula 


/I/ f(p') dp, dp, dp’, = S~ F(p")s5h (52) 


which formula provides the basis of the passage from continuous p’ values to 
discrete ones and vice versa. Any problem can be worked out in terms of 
the discrete p’ values, for which the theory of §§59-61 can be used, and the results 
can be transformed back to refer to continuous p’ values. The arbitrary density 
Sp Should then disappear from the results. 

The second formalism consists in modifying the equations of the theory 
of 8859-61 so as to make them apply to the case of a continuous range of 
basic kets |a’), by replacing sums by integrals and replacing the 6 symbol in 
the commutation relations (11) by 6 functions, so far as concerns the variables 
with continuous eigenvalues. Each of these formalisms has some advantages and 
some disadvantages. The first is usually more convenient for physical discussion, 
the second for mathematical development. Both will be developed here and one 
or other will be used according to which is more suitable at the moment. 

The Hamiltonian describing an assembly of photons interacting with an atom 
will be of the general form (40), with the coefficients Hp, U,, Uw, Vi» involving 
the atom variables. This Hamiltonian may be written 

Hy = Hp + Ho + Rp, (53) 
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where Hp is the energy of the atom alone, Hp is the energy of the assembly of 
photons alone, Hp = So npr hyp, (54) 
pl’ 

Vp being the frequency of a photon of momentum p‘, and Ho is the interaction 
energy, which can be evaluated from analogy with the classical theory, as will be 
shown in the next section. The whole system can be treated by a perturbation 
method as discussed in the preceding section, Hp and Hp providing the energy (41) 
of the unperturbed system and Hg being the perturbation energy, which gives rise 
to transition processes in which photons are emitted and absorbed and the atom 
jumps from one stationary state to another. 

We saw in the preceding section that the probability of an absorption process is 
proportional to the number of bosons originally in the state from which a boson is 
absorbed. From this we can infer that the probability of a photon being absorbed 
from a beam of radiation incident on an atom is proportional to the intensity of 
the beam. We also saw that the probability of an emission process is proportional 
to the number of bosons originally in the state concerned plus one. To interpret 
this result we must make a careful study of the relations involved in replacing 
the continuous range of photon states by a discrete set. 

Let us neglect for the present the polarization variable 1. Let |p’D) be 
the normalized ket corresponding to the discrete photon state p. Then from (22) 


of §16 S" |p’) (p'D| = 1, 
p’ 
which gives from (52) / lp'D) (p'D| sp d°p’ = 1, (55) 


d°p' being written for dp',dp\dp!, for brevity. Now if |p’) is the basic ket 
corresponding to the continuous state p’, we have according to (24) of §16 

[iP el ap’ =1, 
which shows, on comparison with (55), that —_ |p’) = |p’D) a (56) 
The connexion between |p’) and |p’D) is like the connexion between the basic kets 
when one changes the weight function of the representation, as shown by (38) 
of 816. 


With nj, photons in each discrete photon state p’, the Gibbs density p for 
the assembly of photons is, according to (68) of §33, 


p= >_|p'p) ni, (p’p| = / Ip'D) ni, (p'D| sp dp! 
p’ 


= fp rny (p'| d°p’ (57) 
with the help of (56). The number of photons per unit volume in 
the neighbourhood of any point x’ is then (x’| p|x’), according to (73) of §33. 
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From (57) this equals (x’| p |x’) = / (x’ | p’) niy (p’ | x’) d’p’ 


= pe niy d°p! (58) 
if one puts in the value of the transformation function (x’ | p’) given by (54) of §23. 
Equation (58) expresses the number of photons per unit volume as an integral over 
the momentum space, so the integrand in (58) can be interpreted as the number of 
photons per unit of phase space. We obtain in this way the result that the number 
of photons per unit of phase space is equal to h~® times the number of photons 
ver discrete state, in other words, a cell of volume h® in phase space is equivalent 
to a discrete state. This result is a general one, holding for any kind of particle. 
If the polarization variable of the photons is not neglected, the result holds for 
each of the two independent states of polarization. 

The momentum of a photon of frequency v is of magnitude hv /c, so the element 
of momentum space dp,dpy,dp, = hic v*dudw, 
dw being an element of solid angle for the direction of the vector p. 
Thus a distribution of photons with nj, per discrete state, which is equivalent to 
a distribution of h-*nj,d*pd*x photons in an element of volume d°x and an element 
of momentum space d°p, equals a distribution of nic-°v*dvdwd°x photons in 
an element of volume d°x and a frequency range dy and direction of motion dw. 
This corresponds to an energy density njhc*v* per unit solid angle per unit 
frequency range, or an intensity per unit frequency range (i.e. an energy crossing 
unit area per unit time per unit frequency range) of amount 

Ln hee: (59) 

The result that the probability of a photon being emitted is proportional 
to ny, + 1, Nol being the number of photons initially present in the discrete 
state concerned, can now be interpreted as the probability being proportional 
to I,,+ hv?/c?, where I, is the intensity of the incident radiation per unit 
frequency range in the neighbourhood of the frequency of the emitted photon 
and having the same polarization 1 as the emitted photon. Thus with no incident 
radiation there is still a certain amount of emission, but the emission is increased 
or stimulated by incident radiation in the same direction and having the same 
frequency and polarization as the emitted radiation. The present theory of 
radiation thus completes the imperfect one of 845 by giving both stimulated and 
spontaneous emission. The ratio it gives for the two kinds of emission, namely 
I, : hv?/c?, is in agreement with that provided by Albert Einstein’s theory of 
statistical equilibrium mentioned in §45. 

The probability of a photon being scattered from the state p’l’ to the state p 
is proportional to Npy (Np + 1), the n’s being the numbers of photons initially 
in the discrete states concerned. We can interpret this result as the probability 
being proportional to Luv (Ln + h'® /c?). (60) 


ny 


stimulated 
emission 


Maxwell’s 
equations 
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Similarly for a more general radiative process in which several photons are emitted 
and absorbed, the probability is proportional to a factor I, for each absorbed 
photon and a factor [,, + hv?/c? for each emitted photon. Thus the process is 
stimulated by incident radiation in the same direction and with the same frequency 
and polarization as any of the emitted photons. 


63. The interaction energy between photons and 


an atom 
We shall now determine the interaction energy between an atom and an assembly 
of photons, i.e. the Hg of equation (53), from analogy with the classical expression 
for the interaction energy between an atom and a field of radiation. For simplicity 
we shall suppose the atom to consist of a single electron moving in an electrostatic 
field of force. The field of radiation may be described by a scalar and a vector 
potential. These potentials are to a certain extent arbitrary and may be chosen 
so that the scalar potential vanishes. ‘The field is then completely described 
by the vector potential A,, A,, A,, or A. The change that the field causes in 
the Hamiltonian describing the atom is now, as explained at the beginning of 841, 
Hp AV pe Ae ow 

a= a2 {ets y-v'} =o, )+ 508 (61) 
This is the classical interaction energy. The A that occurs here should be the value 
of the vector potential at the point where the electron is momentarily situated. 
It is, however, a good enough approximation if we take this A to be the vector 
potential at some fixed point in the atom, such as the nucleus, provided we are 
dealing with radiation whose wavelength is large compared with the dimensions of 
the atom. 

Let us first consider the field of radiation classically and ignore its interaction 
with the atom. The vector potential A satisfies, according to Maxwell’s theory, 
the equations OA=0, divA=0, (62) 
1 being short for* (1/c?)0?/0t? — 0?/0x? — 0? /Oy? — 0?/0z?. The first of these 
equations shows that A can be resolved into Fourier components in the form! 


A= [eee =f A,e'* es nay dk, (63) 


each Fourier component representing a train of waves moving with the velocity 
of light, described by a vector k whose direction gives the direction of motion of 
the waves and whose magnitude |k| is connected with their frequency 1%, by 
2n1, = c|kl. (64) 
*lThe solidus of the numerical factor is treated differently from the solidus of a differential 
coefficient here, so they are not combined unlike in the original.| 
t[The original has factors of the exponent of the form (kx). These are rewritten as (k,x) in 
all cases when not otherwise defined.| 
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The vector k is just the momentum of a photon which the quantum theory 
would associate with these waves, divided by h. For each value of k we have 
an amplitude Ay, which is in general a complex vector, and the integral in (63) 
extends over the whole of the three-dimensional k-space. The second of equations 
(62) gives (k, Ax) = 0, (65) 
showing that, for each value of k, A, is perpendicular to k. This expresses 
that the waves are transverse waves. A, is determined by its two components 
in two directions perpendicular to each other and to k, these two components 
corresponding to two independent states of linear polarization. 
The total energy of the radiation is given by the volume integral 


Hp = (8n)7} / (6? + 2”) dx (66) 
taken over the whole of space, where the electric field & and the magnetic field # 
of the radiation are given by @ = SH =curlA. (67) 

C 


Using standard formulae of vector analysis, we have 
div [A x #] = (#%, curl A) — (A, curl #) = #? — (A, curl curl A) 


= 4 +(A,V°A) 
with the help of the second of equations (62). Thus (66) becomes, with neglect of 
a term which can be transformed to a surface integral at infinity, 


Hr = (8n)* ii ‘z Ge a - (a, via) d°x. (68) 


By substituting for A here its value given by (63), we can get the energy of 
the radiation in terms of the Fourier amplitudes Ay. The energy of the radiation is 
constant (since we are now ignoring the interaction of the radiation and the atom), 
so in this calculation we may take t = 0. This means taking 


A= ‘ (Ay + A_x)e®” ak, (69) 
VA=— | k?(Ay, + A_,)e***) d3k, 


dA /Ot = ic / |k| (Ay — A_y)e"™™®” dk, (70) 
Inserting these expressions in (68), we get 
i= n> / i / {k?(Ay + Bi Aye + Aw) 
— |k| |k’| (Ay —A_x, Ate —A_) } eM eA) Pk dPK' dx 
= 9? / | {k?(Ay + A_ty Aw + A_w) 
— |k| |k’| (Ax — Aix, Ae — Av) } 5(k +k’) dk ak’, 
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with the help of formula (49) of §23, 6(k +k’) being the product of three factors, 
one for each component of k. Hence 


Hp = : kK’ {(Ax + Ax, A-k + Ak)—(Ak-A_x, A-k-Ax)} °k 
= 2n? | k? {(Ay, Ax) + (An, A_n)} d®k 


— tn? [ 12(A,, By d°k. (71) 


We can replace the continuous distribution of k-values by a dust of discrete 
k-values, like we did with the p-values in the preceding section. The integral 
(71) then goes over, according to formula (52), into the sum 


Hp = 4n? ye I? (Ags Ay ies 


sx being the density of the discrete ie alee We may also write this as 
Hp = 40? SOW Ay Aisi (72) 
k,1 
Ax being a component of A, ina direction 1 perpendicular to k and the summation 
with respect to 1 referring to two directions 1 perpendicular to each other. 
Thus there is one term in (72) for each independent stationary state for a photon. 

The field quantities & and # at any point x can be looked upon as dynamical 
variables. The quantities Ary = Aye?™** Aig = Are?" 
are then dynamical variables at time t, since they are connected with € and 
at various points x at time t by equations which do not involve t, as follows 
from (63) and (67). Aj is constant, so Aj: varies with t according to the simple 
harmonic law. Thus Axi is like the 7, of a harmonic oscillator, defined by (3) of §34, 
the w of the oscillator being 27... We may take each A, to be proportional to 
the 7 of some harmonic oscillator and then the field of radiation becomes a set of 
harmonic oscillators. 

Let us now pass over to the quantum theory and take the Ayy,, Ay to be 
dynamical variables in the Heisenberg picture. The expression (72) for the energy 
may be retained unchanged, the order in which the factors Ajg, Aig there occur 
being the correct one to give no zero-point energy. The A,y then still vary with 
time according to the e’* law and may still be taken to be proportional to the ns 
of harmonic oscillators. The factor of proportionality may be obtained by equating 
(72) to the expression (39) for the energy, with the label a replaced by the two 
labels k and 1 and with hi, for hw. This gives 

An? ye KA Alas = se AY Meet: 
k,1 k,1 
the suffix t being inserted to show that we are dealing with Heisenberg dynamical 
variables (as we should when transferring equations of the classical theory to 
the quantum theory). Hence, using (64), 
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dr Ay = chive na Se, (73) 
with neglect of an unimportant arbitrary phase factor. In this way the Heisenberg 
dynamical variables m.;, which describe the field of radiation as a set of oscillators, 
are introduced. The commutation relations between the m4, and 7,4, are known, 
being given by (11), so equation (73) fixes the commutation relations between 
the A,,, and Ajj, It thus fixes the commutation relations between the potentials A 
and the field quantities @ and # at various points x at the time ¢. (Incidentally, 
the commutation relations of the Aj, Ax are fixed, so the commutation relation 
of two potential or field quantities at two different times is also fixed.) 

We can still use (73) when the interaction between the field of radiation and 
the atom is taken into account. This involves assuming that the interaction does 
not affect the commutation relations between the potentials and field quantities 
at a given time. The interaction causes the mus to cease to vary according to 
the simple harmonic law and the oscillators to cease to be harmonic. Thus it may 
affect the commutation relation between two potential or field quantities at two 
different times. 

We can now take over the interaction energy (61) into the quantum theory, 
putting p; for p to show it is a Heisenberg dynamical variable. Taking the atomic 
nucleus to be at the origin we get, by ete (63) with x = 0 into (61), 


Ag: = or. — fi Pi, Ax + Axt) d°k ate ae ff (Axt ++ Au, Ax + Ax) d°k d?k’ 
2 
ie e = rai oe eee 
= me H (pr, Axe + Axi) 5, Fae D_(Awe + Axe, Ane + Ayi)5, Sy 


if we pass from continuous to discrete k-values. Thus 


e€ = e = = ie 

Hot i Fae SS Piel Anat + Aat) 5, : + nee S- (Ait ++ Anat) (Avve + Axyt) (1, V's, Pea 
k,1 k,k/,1,V 

pu being the component of p; in the direction 1. With the help of (73) we may 

express Hg; in terms of the m4, and 7,,, and we can then drop the suffix t (which 

means going over to Schrédinger dynamical variables), so that we obtain finally 


eh? 
Hg = Pa So rn? ™a + MiaySy 
k,1 e2h Swe _ _ or 
T 357d, » Vi Vie? (tha + Tha) her + Ther), V)sy2sy7 (74) 


k, k’/,1,1/ 

With the model of the atom we are using, the interaction energy appears as 
a linear plus a quadratic function in the 7’s and 7’s. The linear terms give rise to 
emission and absorption processes, the quadratic ones to scattering processes and 
processes in which two photons are absorbed or emitted simultaneously. The order 
of the factors 7 and 7 in the quadratic terms is not determined by the procedure 
of working from the classical theory, but this order is unimportant, since a change 
in it merely changes Hg by a constant. 
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The matrix element of Hg referring to the emission of a photon into the discrete 
state kl, or into the discrete state p’l, as it may also be labelled, with the atom 
jumping from state a° to state a’, is 

eh? ek e€ 
Tima \|Pilo) sk" = Sareea (ol la”) 9p 
since s_ = Sph® The py occurring here, referring to the momentum of the electron, 
is, of course, quite distinct from the other letters p, referring to the momentum of 
the emitted photon. To avoid confusion we shall replace the electron momentum 
p by mx, these two dynamical variables being the same for the unperturbed atom. 
Passing over to continuous photon states by means of the conjugate imaginary of 


equation (56), we get* — (p'la’| Hg |a®) = TOE (a’| & |0°). (75) 


Similarly, the matrix element of Hg referring to the absorption of a photon from 
the continuous state p°l with the atom jumping from state a° to state a’ is 
e€ 

(a’| Hg |p°la°) = ianvyh (a’| ia”), (76) 
and the matrix element referring to the scattering of a photon from the continuous 
state p°l° to the continuous state p’l’ with the atom jumping from state a° to 
state a’ is (p'Y'a'| Hg |p°I’a°) = Teall’ 1°) 5a/a0, (77) 
there being two terms in (74) which contribute to it. These matrix elements will 
be used in the next section. The matrix elements referring to the simultaneous 
absorption or emission of two photons may be written down in the same way, 
but they lead to physical effects too small to be of practical importance. 


(p! Dla’ | Hg |a°) = = 


64. Emission, absorption and scattering of 


radiation 

We can now determine directly the coefficients of emission, absorption, 
and scattering of radiation by substituting in the formulae of Chapter VIII 
the values for the matrix elements given by (75), (76), and (77). 

For determining the emission probabibty we can use formula (56) of 853. 
This shows that for an atom in a state a? the probability per unit time per unit 
solid angle of its spontaneously emitting a photon and dropping to a state a’ of 
lower energy is An? WPle 1 

gy oe cop elale' ) (78) 
Now the energy and momentum of a photon of frequency v are 
W=hvy, P=hv/c. 
Again, from the Heisenberg law (20) of §29, 
(a'| & |a°) = —2niv(a®, a’) (a'| 2 |a°), 


*la° throughout.] 
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v(a°, a’) being the frequency connected with transitions from state a° to state a’, 
which in the present case is just the frequency v of the emitted radiation. 
These results substituted in (78) make the emission coefficient reduce to 

Qrv)? 

Gre)" allen |a°yf (9 
To obtain the rate of emission of energy per unit solid angle for a specified 
polarization, we must multiply this by hv. This gives for the total rate of emission 
of energy in all directions 4 (Qnv)4 


5 tla" ex la") 80) 


which is in agreement with expression (34) of §45 and justifies Werner Heisenberg’s 
assumption for the interpretation of his matrix elements. 

In the same way the absorption coefficient, given by formula (59) of §53, 
becomes for photons 477h?Wle 1 : 2 Sry 2 

: 2P |i Gao (™ltrla")] = = [Ka exile”)? 

This absorption coefficient refers to an incident beam of one photon crossing unit 
area per unit time per unit energy range. If we take one per unit frequency range 
instead of energy range, as is usual when dealing with radiation, the absorption 
coefficient becomes ara ov 2 

TY l(a! |exs|a)|? 
This result is the same as (32) of §45, if we substitute for the FE, there 
the energy hy of a single photon. Thus the elementary theory of 845, in which 
the radiation field is treated as an external perturbation, gives the correct value for 
the absorption coefficient. 

This agreement between the elementary theory and the present theory could 
be inferred from general arguments. The two theories differ only in that the field 
quantities all commute with one another in the elementary theory and satisfy 
definite commutation relations in the present theory, and this difference becomes 
unimportant for strong fields. Thus the two theories must give the same absorption 
and emission when strong fields are concerned. Since both theories give the rate of 
absorption proportional to the intensity of the incident beam, the agreement must 
hold also for weak fields in the case of absorption. In the same way the stimulated 
part of the emission in the present theory must agree with the emission in 
the elementary theory. 

Let us now consider scattering. The direct scattering coefficient is given by 
formula (38) of §50. Such scattering of photons will not be accompanied by any 
change of state of the atom on account of the factor 6,/,0 in the expression for 
the matrix element (77). Thus the final energy W’ of the photon will equal 
its initial energy W° The scattering coefficient now reduces to 


(e4/m?c*) (I, 1°)? 


2 
’ 
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This is the same as that given by classical mechanics for the scattering of radiation 
by a free electron. We thus see that the direct scattering of radiation by an electron 
in an atom is independent of the atom and is correctly given by the classical theory. 
This result, it should be remembered, holds only provided the wavelength of 
the radiation is large compared with the dimensions of the atom. 

The direct scattering is a mathematical concept and cannot be separated out 
experimentally from the total scattering, given by formula (44) of 851. Let us 
see what this total scattering is in the case of photons. We must be careful in 
our application of formula (44) of §51. The summation }°, in this formula may be 
considered as representing the contribution to the scattering of double transitions 
consisting of transitions firstly from the initial state to state k and secondly from 
state k to the final state. The first transition may be an absorption of the incident 
photon and the second an emission of the required scattered photon, but it is also 
possible for the first transition to be the emission and the second the absorption. 
It is clear from the general nature of the method used for deriving formula (44) of 
§51 that both these kinds of double transitions must be included in the summation 
>=, when this formula is applied to photons, although only the first of them appears 
in the actual derivation given in 851, as the possibility of the particle being created 
or annihilated was not taken into account there. 

We use zero, single prime and double prime to refer to the initial, final and 
intermediate states of the atom respectively, and zero and single prime to refer to 
the absorbed and emitted photons respectively. Then, for the double transition of 
absorption followed by emission, we must take for the matrix elements 


(kV |p°a®), (pa! V |r) 
of the formula (44) of §51 
(kV [p°a®) = (a"| Hg |p’ Pa"), (p'a'| V |k) = (p'Ya| Ha |a"). 
Also E! — Ey, = hv + Hp(a®) — Hp(a") = hip — v(a",a®), 
where hv(a’t a°) = Hp(a") — Hp(a°). 
Similarly, for the double transition of emission followed by absorption we must take 
(k|V |p? a®) = (p'To"| Hg |a®),  (p'a’| V |k) = (a"| Hg [p"l’a") 
and EF’ — Ey = hv® + Hp(a®) — Hp(a") — hv® — hv’ = —hAlp’ + v(a", a], 
there being now two photons, of frequencies v° and v’, in existence for 


the intermediate state. Substituting in (44) of §51 the values of the matrix elements 
given by (75), (76) and (77), we get for the scattering coefficient 


ey 


h2c* p9 


2 
har yo , (a'|avlal")(al"|ala°) — (a’|eo|a”)(a" |v |a°) 
(IP) dara yy = 


y® — v(a", a) v' + v(a", a) 


QQ”! 


(81) 
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If we write (81) in terms of x instead of %, we get 


rer ee Te ie 
hack 0 aa tel Oata0 
2 
/ i 0 / " W 0 
1 omy (ait oy Je levlat a" |zwlat) — (a!|aplal’)(al'|xv a’) 
Tee )u(ata 4 A vielen). <a ele . (82) 


We can simplify (82) with the help of the quantum conditions. We have 
LyXLyo — Lory = 0, 


a") ("| X10 |”) = (a’| Xo a”) (a"| xy 


which gives S- {(a'| xy 


al’ 


and also Ly Lo = Lyoxy = (1/m)(aypyo — ppxy) = (ih/m)(V, 1°), 


a®)} = 0, (83) 


which gives 


Y (loll 


a”) v(ala®) (a"| xp |a°) — v(a’,a”) (a’| x fal”) (0”] vy 
1! ah h 

= — — I’ 19 Ow'ak — = 

Qi a V)Baras 2am 
Multiplying (83) by v’ and adding to (84), we obtain 


Ss" {(a' lary al’) (al" |x }a)[v' +y (a a°)|— (a |x,0 Ja”) al" ay 
= (h/2nm)(V,1)dare0. 


If we substitute this expression for (h/2mm)(1,1°)dq:,0 in (82), we obtain, after 
a straightforward reduction making use of identical relations between the v’s, 


(27e)4 0,73 
h2c4 = 


This gives the scattering coefficient in the form of the effective area that a photon 
has to hit per unit solid angle of scattering. It is known as the Kramers-Heisenberg 
dispersion formula, having been first obtained by these authors from analogies with 
the classical theory of dispersion. 

The fact that the various terms in (82) can be combined to give the result (85) 
justifies the assumption made in deriving formula (44) of §51, that the matrix 
elements (p/a’|V |p”a”) of the interaction energy are of the second order of 
smallness compared with the (p’a’‘| V |k) ones, at any rate when the scattered 
particles are photons. 


ay 


(U1) boxes (84) 


a®)[v'+ v(a’, a”) } 


2 
Se aeey (a"|xpla®) — (a!|apla”)(a"|ay|a°) (85) 
vy — v(ala®) uv’ + v(a’a®) 


Ql! 


65. An assembly of fermions 

An assembly of fermions can be treated by a method similar to that used in 
§§59 and 60 for bosons. With the kets (1) we may use the antisymmetrizing 
operator A defined by Azul? SS +P. (2°) 
summed over all permutations P, the + or — sign being taken according to whether 
P is even or odd. Applied to the ket (1) it gives 


Kramers- 
Heisenberg 
dispersion formula 


antisymmetrizing 
operator 
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ul? S° +P lafabas...a%,) = Ala%a’a®...a%), (3/) 
a ket corresponding to a state for an assembly of wu’ fermions. The ket (3’) is 
normalized provided the individual fermion kets |a*), |a°),... are all different, 


otherwise it is zero. In this respect the ket (3’) is simpler than the ket (3). However, 
(3’) is more complicated than (3) in that (3’) depends on the order in which 
a*, a’, aS... occur in it, being subject to a change of sign if an odd permutation 
is applied to this order. 

We can, as before, introduce the numbers nj, n2, 23,... of fermions in the states 
a, a?) @@),... and treat them as dynamical variables or observables. They each 
have as eigenvalues only 0 and 1. They form a complete set of commuting 
observables for the assembly of fermions. The basic kets of a representation with 
the n’s diagonal may be taken to be connected with the kets (3’) by the equation 

Alata’a®...a%) = +|ningng...) (6’) 
corresponding to (6), the n’’s being connected with the variables a% a®, aS... 
by equation (4). The + sign is needed in (6’) since, for given n’’s, the occupied 
states a% a® a%... are fixed but not their order, so that the sign of the left-hand 
side of (6’) is not fixed. To set up a rule which determines the sign in (6’), 
we must arrange all the states a for a fermion arbitrarily in some standard order. 
The a’s occurring in the left-hand side of (6’) form a certain selection from 
all the a’s and the standard order for all the a’s will give a standard order 
for this selection. We now make the rule that the + sign should occur in (6’) 
if the a’s on the left-hand side can be brought into their standard order by 
an even permutation and the — sign if an odd permutation is required. Owing to 
the complexity of this rule, the representation with the basic kets |njnjn...) is 
not a very useful one. 


If the number of fermions in the assembly is variable, we can set up the complete 


set of kets |), |la*), Ala%e”), Alata’a®), ..., (9’) 
corresponding to (9). A general ket is now expressible as a sum of the various 
kets (9’). 


To continue with the development we introduce a set of linear operators 1, 7, 
one pair 7), 7,, corresponding to each fermion state a%, satisfying the commutation 


relations Nan + NNa = 9, 
Hath, + Talla = 0, (11’) 
Na alles Na _ Oab- 


These relations are like (11) with a+ sign instead of a — sign on the left-hand side. 
They show that, for a 4 b, n, and 7, anticommute with 7, and 7,, while, putting 
b=a,they give j=0, 72=0, Wala t Mala = 1 (11”) 
To verify that the relations (11’) are consistent, we note that linear operators 7, 7) 
satisfying the conditions (11) can be constructed in the following way. For each 
state a® we take a set of linear operators Opa, Oya; Oza like the o,, oy, o, introduced 
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in §37 to describe the spin of an electron and such that oz¢, Oya, Gq Commute with 
Orb, Fy, Tz for b # a. We also take an independent set of linear operators Ca, 
one for each state a%, which all anticommute with one another and have their 
squares unity, and commute with all the o variables. Then, putting 
Na = Baie _ Iya); Na = Noa Oe re Wye); 

we have all the conditions (11’) satisfied. 

Brom (11") (ala) ="aMlaMalla = Na(l = MaMa) a = Nala’ 
This is an algebraic equation for 7,7,, showing that 7,7, is an observable with 
the eigenvalues 0 and 1. Also 7,77, commutes with 7,7, for b 4 a. These results 
allow us to put i Ne = Nas (12') 
the same as (12). From (11”) we get now 7,7, =1— Ma, (135) 
the equation corresponding to (13). 

Let us write the normalized ket which is an eigenket of all the n’s belonging to 
the eigenvalues zero as! ),. Then = ng) , = 0, 
so from (12! (amaTa)4 = 0. 
Hence Hy = 0; (15’) 


like (15). Again (allalaya = (alba) )a—hala = 
showing that 7.) 4 is normalized, and 

Mala) a =NaNaNa) a = Nall — Na)) 4 = Na) as 
showing that 7), is an eigenket of n, belonging to the eigenvalue unity. It is 
an eigenket of the other n’s belonging to the eigenvalues zero, since the other 
n’s commute with n,. By generalizing the argument we see that natne---Ng) 4 
is normalized and is a simultaneous eigenket of all the n’s, belonging to 


the eigenvalues unity for ng, Np, Ne, ..., Ng and zero for the other n’s. This enables 
us to put A lata’a®. OF = Wats stg ae (17') 
both sides being antisymmetrical in the labels a, b, c,..., g. We have here 


the analogue of (17). 

If we pass over to a different set of basic kets |@“) for a fermion, we can 
introduce a new set of linear operators 74 corresponding to them. We then find, 
by the same argument as in the case of bosons, that the new 7’s are connected 
with the original ones by (21). This shows that there is a procedure of second 
quantization for fermions, similar to that for bosons, with the only difference second 
that the commutation relations (11’) must be employed for fermions to replace quantization 
the commutation relations (11) for bosons. 

A symmetrical linear operator Ur of the form (22) can be expressed in terms 
of the 7 & 7 variables by a similar method to that used for bosons. Equation (24) 
still holds, and so does (25) with S replaced by A. Instead of (26) we now have 


‘[The 4th edition has |0) for ) ,.] 
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Urns, Neo ++: dA = Me So (=1) hate, Ney Tes tee dA (a| U |z,) 
= Soa do (=1) ie Mees «4 Ste, (a| U |b), (26’) 
a,b r 


n;,' meaning that the factor 7, must be cancelled out, without its position among 
the other 7,’s being changed before the cancellation. Instead of (27) we have 


Nii aL) ie iets Wy ores (27') 


so (28) holds with ) , for ), and thus (29) holds unchanged. We have the same final 
form (29) for Up in the fermion case as in the boson case. Similarly, a symmetrical 
linear operator Vr of the form (30) can be expressed as 
Vr = S> nam (ad| V led) Hates (35’) 
a,b,c,d 
the same as one of the ways of writing (35). 
The foregoing work shows that there is a deep-seated analogy between 
the theory of fermions and that of bosons, only slight changes having to be made 
in the general equations of the formalism when one passes from one to the other.* 


*lAfter this the 4th edition has a set of paragraphs on the creation and annihilation of 
fermions. | 


XI. RELATIVISTIC THEORY OF 
THE ELECTRON 


66. Relativistic treatment of a particle 

THE theory we have been building up so far is essentially a non-relativistic one. 
We have been working all the time with one particular Lorentz frame of reference 
and have set up the theory as an analogue of the classical non-relativistic dynamics. 
Let us now try to make the theory invariant under Lorentz transformations, so that 
it conforms to the special principle of relativity.* 

In the first place we note that the general principle of superposition of states, 
as given in Chapter I, is a relativistic principle. It applies to ‘states’ with 
the relativistic space-time meaning. Beyond this, though, the theory does 
not lend itself very well to relativistic treatment, owing to the fundamental 
notion of an ‘observable’ not fitting in very well with the requirements of 
relativity. The measurement of an observable, in the theory we have been 
dealing with up to the present, has always consisted in the measurement of 
some dynamical variable at some instant of time in some Lorentz frame of 
reference and there does not seem to be any very natural way of generalizing this 
notion of an observable to make it cease to refer to a particular Lorentz frame. 
In consequence one cannot set up a scheme of relativistic quantum mechanics 
with the same degree of generality as the non-relativistic theory. All one can 
do is to solve special problems in a Lorentz-invariant way. This should not be 
regarded as a defect of the quantum theory, since it is in perfect analogy with 
the classical theory. Relativistic classical mechanics does not involve any such 
general scheme as the contact transformation theory of non-relativistic classical 
mechanics, but consists in the solution of comparatively special problems. 

One of the special problems that can be handled relativistically is that of 
the motion of a particle in an external field of force. Our non-relativistic quantum 
mechanics applied to this problem can be fitted in with the formalism of relativity 
by a change of notation. We put 21, v2, x3 for x, y, z and 2% for ct, so that 
the time dependent wave function in Schrédinger’s representation appears as 
W(Xo, £1, £2, £3), in which the four x’s may be treated on the same footing. We write 
the momentum components as p1, p2, p3 instead of pz, Py, Dz- 


‘From this point the rest of the section has been revised in the 4th edition.] 


Ziel 
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Pr ~) = —ino ) (r = 1,2, 3). (1) 
To preserve the symmetry between the four x’s we introduce a corresponding linear 
operator po, equal to the energy divided by c, whose effect on w is 
O 
pow) = ins ). (2) 
The difference in sign in (1) and (2) is required by relativity. 

We treat xo and po as dynamical variables on the same footing as the other x’s 
and p’s. They provide a new degree of freedom. The standard ket in (1) and (2) 
must refer to this new degree of freedom as well as to the previous ones. 
The lack of symmetry between the treatment of xp and that of the other «’s 
in the non-relativistic theory may be considered as due to our always using 
a representation with x9 diagonal and leaving understood the standard ket for 
the (x9 po) degree of freedom. It would seem that only representations with 29 
diagonal are useful in the non-relativistic theory. We may therefore expect 
that in a relativistic theory, which treats all the four x’s on the same footing, 
only representations with the four x’s diagonal will be useful. It then becomes 
convenient to leave understood the standard ket for all four degrees of freedom 
and to write any ket as a wave function in the four 2’s. 

In the theory of the electron that will be developed here we shall have 
to introduce some further degrees of freedom describing an internal motion of 
the electron. A ket for the whole system will now be written as a ket in these 
further degrees of freedom and a wave function in the four «’s, and will appear 
as |XX1X2x3), or |x) for brevity, according to the notation explained near the end 
of §20. 


They satisfy 


67. The wave equation for the electron 
Let us consider first the case of the motion of an electron in the absence of 
an electromagnetic field, so that the problem is simply that of the free particle, 
as dealt with in 830, with the possible addition of internal degrees of freedom. 
The relativistic Hamiltonian provided by classical mechanics for this system is 
given by equation (23) of §30, and leads to the wave equation 
{po — (m?c? + pi + p3 + 93)*} |x) = 0, (3) 

where the p’s are interpreted as operators in accordance with equations (1) and (2). 
Equation (3), although it takes into account the relation between energy and 
momentum required by relativity, is yet unsatisfactory from the point of view of 
relativistic theory, because it is very unsymmetrical between pp and the other p’s, 
so much so that one cannot generalize it in a relativistic way to the case when 
there is a field present. We must therefore look for a new wave equation. 

If we multiply the wave equation (3) on the left by the operator 
{po + (m?c? + p? + ps + p2)3}, we obtain the equation 
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{pj — m?c* — pt — ps — p3} |x) = 0, (4) 
which is of a relativistically invariant form and may therefore more conveniently 
be taken as the basis of a relativistic theory. Equation (4) is not completely 
equivalent to equation (3) since, although every solution of (3) is also a solution 
of (4), the converse is not true. Only those solutions of (4) belonging to positive 
values for po are also solutions of (3). 

The wave equation (4) is not of the form required by the general laws of 
the quantum theory on account of its being quadratic in po. In §27 we deduced 
from quite general arguments that the wave equation must be linear in the operator 
0/Ot or po, like equation (7) of that section. We therefore seek a wave equation 
that is linear in pp and that is roughly equivalent to (4). In order that this wave 
equation shall transform in a simple way under a Lorentz transformation, we try 
to arrange that it shall be rational and linear in p;, pz and p3 as well as in po, 
and thus of the form — {pp + ayp; + Q2p2 + azp3 + B} |x) = 0, (5) 
where the a’s and (@ are independent of the p’s. Since we are considering the case 
of no field, all points in space-time must be equivalent, so that the operator in 
the wave equation must not involve the x’s. Thus the a’s and § must also be 
independent of the x’s, so that they must commute with the p’s and the x’s. 
They therefore describe some new degree of freedom, belonging to some internal 
motion in the electron. We shall see later that they bring in the spin of the electron. 
It is these degrees of freedom to which the ket |x) refers. 

Multiplying (5) by the operator {po — aip: — a2p2 — a3p3 — 3B} on the left, 
we obtain 


Po — > [oipi (a1a2 + A201) pipe + (a1 8 + Ba) pi] - | Iz) = 0, 
123 


where )°,3 refers to cyclic permutations of the suffixes 1, 2, 3. This is the same 
as (4) if the a’s and G satisfy the relations 


a; = 1, aya2 + a2a, = 0, 

6 = mie’, aif + Bor = 0, 
together with the relations obtained from these by permuting the suffixes 1, 2, 3. 
If we write B=Ayinic 


these relations may be summed up in the single one, 
Any + yA, = Ww (u,v =1,2,3, or m). (6) 
The four a’s all anticommute with one another and the square of each is unity. 
Thus by giving suitable properties to the a’s and @ we can make the wave 
equation (5) equivalent to (4), in so far as the motion of the electron as a whole 
is concerned. We may now assume (5) is the correct relativistic wave equation 
for the motion of an electron in the absence of a field. This gives rise to one 
difficulty, however, owing to the fact that (5), like (4), is not exactly equivalent 
to (3), but allows solutions corresponding to negative as well as positive values 
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of po. The former do not, of course, correspond to any actually observable motion 
of an electron. For the present we shall consider only the positive-energy solutions 
and shall leave the discussion of the negative-energy ones to 873. 

We can easily obtain a representation of the four a’s. They have similar 
algebraic properties to the o’s introduced in §37, which o’s can be represented by 
matrices with two rows and columns. So long as we keep to matrices with two rows 
and columns we cannot get a representation of more than three anticommuting 
quantities, and we have to go to four rows and columns to get a representation of 
the four anticommuting a’s. It is convenient first to express the a’s in terms of 
the o’s and also of a second similar set of three anticommuting variables whose 
squares are unity, ~1, P2, P3 Say, that are independent of and commute with the a’s. 
We may take, amongst other possibilities, 

Q1= P1901, A2=/P102, A3= P1903, Am = P3, (7) 
and the a’s will then satisfy all the relations (6), as may easily be verified. If we now 
take a representation with p3 and a3 diagonal, we shall get the following scheme 
of matrices: 


010 0 0 Aye: 6 100 0 
si OOO) cca Os Orr.) 4. =| ati! 0 
6°00) ace 000 -4i~ 001 0Pf 
0010 00 i 0 6.20, OF a 
0010 0 ee = 6 10 0 0 
0001 0.0 Oo <9 01 0 0 
BES We Gy. ee = Ne Oe SE. 0 Gee eat 
010 0 0% 0 0 O10! 20) 1 


Corresponding to the four rows and columns there are four independent kets, 
so that the wave function will have four components. We saw in §37 that the spin 
of the electron requires the wave function to have two components. The fact 
that our present theory gives four is due to our wave equation (5) having twice 
as many solutions as it ought to have, half of them corresponding to states of 
negative energy. 

With the help of (7), the wave equation (5) may be written with 
three-dimensional vector notation {po + pi(o, p) + p3mc} |x) = 0. (8) 
To generalize this equation to the case when there is an electromagnetic 
field present, we follow the classical rule of replacing po and p by po + (e/c)Ao 
and p+ (e/c)A, Ao and A being the scalar and vector potentials of the field at 
the place where the electron is. This gives us the equation 


{po + Ay + h1 (o. p+ “A) + pyc} |x) = 0, (9) 


which is the fundamental wave equation of the relativistic theory of the electron. 
The conjugate imaginary of equation (9) reads 
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e€ € 
(x| {0 + Ao + p1 (4. p+ <A) + psc} = 0, (10) 


in which the operators p operate to the left. An operator of differentiation 
operating to the left must be interpreted according to (24) of §22. 


68. Invariance under a Lorentz transformation 
Before proceeding to discuss the physical consequences of the wave equation 
(9) or (10), we shall first verify that our theory really is invariant under a 
Lorentz transformation, or, stated more accurately, that the physical results the 
theory leads to are independent of the Lorentz frame of reference used. This is 
not by any means obvious from the form of the wave equation (9). We have 
to verify that, if we write down the wave equation in a different Lorentz frame, 
the solutions of the new wave equation may be put into one-one correspondence 
with those of the original one in such a way that corresponding solutions may 
be assumed to represent the same state. For either Lorentz frame, the square of 
the length of the ket |x) should give the probability per unit volume of the electron 
being at the place x in that Lorentz frame. We may call this the probability 
density. Its values, calculated in different Lorentz frames for wave functions 
representing the same state, should be connected like the time components in 
these frames of some 4-vector. Further, the 4-dimensional divergence of this 
4-vector should vanish, signifying conservation of the electron, or that the electron 
cannot appear or disappear in any volume without passing through the boundary.’ 

For discussing Lorentz transformations it is convenient to make the convention 
that terms containing a repeated suffix are to be summed over the values 0, 1, 2, 3 
for that suffix. This enables us to write equation (9) in the form 


{au(Py + (e/c)Ay) + Amme} |x) = 0, (11) 
ao being equal to unity, and similarly we can write equation (10) in the form 
(2| {Qy (Pp + (e/c)Ay) + Ammce} = 0. (12) 


We now apply a Lorentz transformation and denote quantities referring to 
the new frame by a star |e.g. *|. The components of the 4-vectors p and A will 
transform according to a linear law of the type 

Pu = cr Oe A, = GA, (13) 
Substituting these expressions for p, and A,, in equations (11) and (12), we obtain 


{pu (py + (e/c) Ay) + Amme} |x) = 0 
‘ t (14) 

and (2| {QpApr(py + (e/c) Ay) + Omme} = 0. 
We now try to bring these equations back to the form of the original (11) and (12) 
by making a transformation ae) Se le) (15) 


t[The rest. of this section is substantially rewritten in the 4th edition.] 


probability density 
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where ¥ is a linear operator in the internal degrees of freedom and is independent 
of the x’s and p’s. The conjugate imaginary equation to (15) is 


(x*| = (a|7. (16) 
Equations (14) will go over into the equations 


Tow (ph + (¢/e)AZ) + emme} 2") = i a7 
and (x"| {ay(p, + (e/c) A?) + amme}y = 0 
provided we can choose y such that 
AO gh = Oh Ga Vm = Am: (18) 


These equations (17) are of the same form as (11) and (12), as required, 
since one can divide out by the extra factors 7 and y. The transformation given by 
(15), (16) and (18) is something like a unitary transformation, but is more general 
since y does not satisfy the unitary condition. 

In order to verify that we can choose y to satisfy the equations (18), let us 
first take the special case when the change of our frame of reference consists 
simply of a rotation through a hyperbolic angle @ in the xox ,-plane, so that 
the transformation equations for the components of a 4-vector are of the type 

Po = po cosh é + p; sinh 8, 

Pi = pg sinh 6 + pj} coshé, (19) 

P2=P>, 3 = D3. 
The values of the a, may be written down at once from a comparison of these 
equations with (13). With these values for the a, it is easy to see that equations 
(18) hold when we take 4 = 4 = 7%, (20) 
We have, in fact, Yaoy = Vy = &™ 
1+ 0a, + 0707 /2!+ Pag/3!+---. 
On account of a? = 1, this reduces to 


Fooy = {14+ 67/214 ---}4+ar{O+ 67/3! +---} 
= cosh@ + a; sinh 
= ag cosh 6 + a; sinh é. 
Again, Jory = a17yY = ao sinh 6 + a; cosh 0. 
Further, Fagy = et qe = cto eM Q, = ay, 


since Q2 anticommutes with a,, which results in agf(a,;) = f(—a,)a2 for any 
function f(a,) of a;. Similarly, 

ya3y = As, YamY = Am: 
Thus the five equations (18) hold with y given by (20) when the a,, are given 
by (19). 
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As asecond typical change of the frame of reference, we may consider a rotation 
through an angle @ in ordinary space about the x-axis. The transformation 
equations are now ie M1 = Pi; 

p2 = p> cos@ + p3 sind, 
p3 = —p5 sin@ + p3 cos 6. 
With the new values for the a,,, we can easily verify that equations (18) hold with 
y= e 200208 7= e 200302 e2ia203 
the analysis being very similar to the preceding case. 

If two changes of the frame of reference are made consecutively, we simply have 
to multiply the corresponding y’s to get the y for the resultant change. Now any 
change of the frame of reference may be built up from two rotations of the types 
we have considered, and hence there will always be a ¥ satisfying (18). 

In this way we see that the solutions of the wave equations in the new frame of 
reference, equations (17), can be put into a natural one-one correspondence with 
those of the original wave equations (11) and (12), corresponding solutions being 
connected by (15) and (16), and we may assume that corresponding solutions 
represent the same state. It remains for us to verify that the probability density 
transforms like the time component of a 4-vector and that the divergence of 
this 4-vector vanishes. 

The probability density is (x |x) = (x|aqg|x) since ag = 1. Let us see 
how the four quantities (x|q,,|v), with « = 0,1,2,3, transform under a Lorentz 
transformation. We have, from (15), (16) and (18), 

(2*| ay |x") = (2|Yavy |@) = (2] Opayr |%) = (x| Op |) Oy, 

Comparing this result with (13), we see that the four quantities (x| q,, |) transform 
like the covariant components of a 4-vector (as defined in §74). The contravariant 
components will be 

(x|2), —{alai|x),  —(elag|z), = —{ala3|x). (21) 
This verifies that the probability density (x | x) is the time component of a 4-vector 
and that the corresponding space components are — (z|a, |x) (with r = 1, 2,3). 
These space components multiplied by the factor c give the probability current, 
or the probability of the electron crossing unit area per unit time. 


The divergence of the 4-vector is Fa) 
a. (2| OQ, |Z) , (22) 
Hb ad 


where the + sign means that the + sign is to be taken for 4 = 0 and the — sign 
for pp = 1,2,3 before one does the summation. To prove this divergence vanishes, 
multiply equation (11) by (z| on the left and (12) by |) on the right and subtract. 


The result ist (2| (Quy |e)) — Kx| @up,z) |z) = 0, 


t[The dots have been replaced by round brackets.| 


probability current 
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the brackets denoting that p, operates to the right on |) in the first term and to 
the left on (x| in the second. With the help of (1) and (2) and the interpretation 
(24) of §22 for operators of differentiation operating to the left, this gives 

, Oley: 4g) 

ue ae {lay aE, + ae, ayle)} = 0, 
which just expresses the vanishing of (22). In this way we complete the proof that 
our theory gives consistent results in whichever frame of reference it is applied. 


69. The motion of a free electron 

It is of interest to consider the motion of a free electron* in the above theory 
according to the Heisenberg picture and to study the Heisenberg equations 
of motion. These equations of motion can be integrated exactly, as was first done 
by Erwin Schrodinger! For brevity we shall omit the suffix t which the notation 
of §28 requires to be inserted in dynamical variables that vary with time in 
the Heisenberg picture. 

As Hamiltonian we must take the expression which we get as equal to cpp when 
we put the operator on |x) in (8) equal to zero, i.e. 

i= —cpila, P) = p3mc = —c(Q, P) = p3mc’. (23) 
We see at once that the momentum commutes with H and is thus a constant of 
the motion. Further, the x,;-component of the velocity is 
Ly = [z1, H] = —€Q1. (24) 
This result is rather surprising, as it means an altogether different relation between 
velocity and momentum from what one has in classical mechanics. It is connected, 
however, with the expressions (21) for the probability density and current. The «1 
given by (24) has as eigenvalues +c, corresponding to the eigenvalues +1 of aj. 
As x and #3 are similar, we can conclude that a measurement of a component of 
the velocity of a free electron is certain to lead to the result +c. This conclusion is 
easily seen to hold also when there is a field present. 

Since electrons are observed in practice to have velocities considerably less than 
that of light, it would seem that we have here a contradiction with experiment. 
The contradiction is not real, though, since the theoretical velocity in the above 
conclusion is the velocity at one instant of time while observed velocities are always 
average velocities through appreciable time intervals. We shall find upon further 
examination of the equations of motion that the velocity is not at all constant, 
but oscillates rapidly about a mean value which agrees with the observed value. 


*lThe arithmetic signs of equation terms of the following sections have been revised in the 
fourth edition.| 

tSchrédinger, Erwin ,Uber die  kraftefreie Bewegung in der relativistischen 
Quantenmechanik“  Sitzungsberichte der Preufischen Akademie der Wissenschaften. 
Physikalisch-mathematische Klasse, (1930), pp. 418-428 { OCLC Number: 40202584 
URL: https://books.google.co.uk/books?id=QhMXAQAAMAAJ } 
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It may easily be verified that a measurement of a component of the velocity 
must lead to the result +c in a relativistic theory, simply from an elementary 
application of the principle of uncertainty of §24. ‘To measure the velocity we must 
measure the position at two slightly different times and then divide the change of 
position by the time interval. (It will not do to measure the momentum and apply 
a formula, as the ordinary connexion between velocity and momentum is not valid.) 
In order that our measured velocity may approximate to the instantaneous velocity, 
the time interval between the two measurements of position must be very short 
and hence these measurements must be very accurate. The great accuracy with 
which the position of the electron is known during the time-interval must give rise, 
according to the principle of uncertainty, to an almost complete indeterminacy in 
its momentum. This means that almost all values of the momentum are equally 
probable, so that the momentum is almost certain to be infinite. An infinite value 
for a component of momentum corresponds to the value +c for the corresponding 
component of velocity. 

Let us now examine how the velocity of the electron varies with time. We have 

Ray = a,H = Hay. 
Now since a, anticommutes with all the terms in H except —ca,pj, 


ay + Hay = —ay,ca1py — caypyay = —2cp,, 

and hence thay, = 20,H + 2cp, 

= —2Ha, — a (25) 
Since H and p, are constants, it follows from the first of equations (25) that 

thd, = 26, H. (26) 
This differential equation in a; can be integrated immediately, the result, being 

Gy = be Ath (27) 

where @? is a constant, equal to the value of @, when t = 0. The factor e~2/¢/" 


must be put to the right of the factor a? in (27) on account of the H occurring to 
the right of the a; in (26). The second of equations (25) leads in the same way to 
the result Gy = etna? 

We can now easily complete the integration of the equation of motion for 71. 
From (27) and the first of equations (25) 


ay = Mhave *F/* A — cp, HO} (28) 
and hence the time-integral of equation (24) is 
= leprae Oey 7 +c, Ht + ai, (29) 


a, being a constant. 

From (28) we see that the x; component of velocity, —ca,, consists of two parts, 
a constant part c?p;H~', connected with the momentum by the classical relativistic 
formula, and an oscillatory part —4icha%e~"4/"H-!, whose frequency is high, 
being 2H/h, which is at least 2mc?/h. Only the constant part would be observed 
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in a practical measurement of velocity, such a measurement giving the average 
velocity through a time-interval much larger than h/2mc2 The oscillatory 
part secures that the instantaneous value of «; shall have the eigenvalues tc. 
The oscillatory part of 2, is small, being, according to (29), 

Leh? Qe MHVh H-? — lich(aq + cp. H—') HO} 
which is of the order of magnitude h/mc, since (a, + cp,H~') is of the order of 
magnitude unity. 


70. Existence of the spin 

In 867 we saw that the correct wave equation for the electron in the absence 
of an electromagnetic field, namely equation (5) or (8), is equivalent to 
the wave equation (4) which is suggested from analogy with the classical theory. 
This equivalence no longer holds when there is a field. The wave equation to be 
expected from analogy with the classical theory in this case is 


e,\ e,\2 
in which the operator is just the classical relativistic Hamiltonian. If we multiply 


(9) by some factor on the left to make it resemble (30) as closely as possible, 
namely the factor po + ooh — pi (4. pt+ “A) — p3mc, 
c c 


Cs e,.\ 
we get {(o0+£.Av) ms (-. p+“A) —m 
c c 


e€ e€ € e€ 
+1|(v0+=Ao)(o; p+£A)-(o; +£a)(m+ 2a lz)=0. (31) 
We now use the general formula that, if B and C are any two three-dimensional 
vectors that commute with o, 
(0, B)(o, C) = S {op Bic, + 0102B,C, + 020, B2C}}, 
123 
the summation referring to cyclic permutations of the suffixes 1, 2, 3, or 


(o,B)(o,C) = (B,C) +1) 03(BiCz — BoC) 
123 
= (B,C) + i(o,B x C). (32) 
Taking B = C = p+ (e/c)A, we find, since 
(p+ <A) x (p+ <A) — “tpxA+Axp} 
C C C 
= —ih(e/c)curl A = —ih(e/)H, 
where # is the magnetic field, that 
he 


(<. pt “a) = (p de “a) + (0,90), (33) 


Also we have 
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e e e e e 
@ + = Ao) (a: p+ “A) = (-. pt ~A)(bo + “ Ao) = Ad poA—Apo+Aop—pAo) 


= ihe (« 10A | grad Aa) 
C 


T 
| 
He 
S 
= 


where @ is the electric field. Thus (31) becomes 
é-; ¥ bay ce, HE . he _ 
{ (r+ =A) — (p ++ “A) —mc — (a) - ip —(0,) ee O... (34) 


This equation differs from (30) through having two extra terms in the operator. 
These extra terms involve some new physical effects, but since they are not real 
they do not lend themselves very directly to physical interpretation. 

To get an understanding of the physical features involved in the difference 
between (34) and (30)! it is better to work with the Heisenberg picture, this picture 
being always the more suitable one for comparisons between classical and 
quantum mechanics. The Heisenberg equations of motion are determined by 


the Hamiltonian H = —eAo — cpi @ pt <A) — psc’, (35) 
the generalization of (23) to the case when there is a field. Equation (35) gives 


He ‘ e€ 2 
(2 + “Aa = {1 (4. pt “A) re psc} 
Co” 6 C 
Cay 22 
= (<. p+<A) + mc 
c, ; 
_ (p “A ) timc? 4 “(o, KH ) (36) 
€ C 
with the help of (33). We have here the real part of the extra terms in (34) 
appearing without the? imaginary part. For an electron moving slowly (i.e. with 
small momentum), we may expect the Heisenberg equations of motion to be 
determined by a Hamiltonian of the form mc? + H,, where H, is small compared 
with mc*. Putting mc? + H, for H in (36) and neglecting H? and other terms 


involving c~?, we get, on dividing by 2m, 


il 2h 
Hy + eAg = (p “A) bo (0, #). (37) 
2m c 2mc 
The Hamiltonian H; given by (37) is the same as the classical Hamiltonian for 
a slow electron, except for the last term he (o, #) 
diac! 


This term may be considered as an additional potential energy which a slow 
electron has in the quantum theory and may be interpreted as arising from 
the electron having a magnetic moment —(he/2mc)o. This magnetic moment 


[The original refers to (31) when the comparison is easier to (30) as referred to in 
the previous paragraph. 
+[‘pure’ omitted. 


magnetic moment 
of the electron 


spin of an electron 


spin 
momentum 


angular 
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is the one assumed in 8841 and 47 for dealing with the Zeeman effect and is in 
agreement with experiment. 

The spin angular momentum does not give rise to any potential energy and 
therefore does not appear in the result of the preceding calculation. The simplest 
way of showing the existence of the spin angular momentum is to take the case of 
the motion of a free electron or an electron in a central field of force and determine 
the angular momentum integrals. This means working with the Hamiltonian (23), 
or with the Hamiltonian (35) with A = 0 and Ap a function of the radius r, ice. 

H = —eA(r) — cpi(o, p) — p3mc?, (38) 
and obtaining the Heisenberg equations of motion for the angular momentum. 
With either Hamiltonian we find for the rate of change of the x,-component of 
orbital angular momentum, m,; = %p3 — X32, with the help of commutation 
relations proved in §35, ihm, =m,H — Hm, 

= —cpi{mi(a, p) — (a, p)mi} 

= —cpi(o, mp — pm:) 

= —ihcp{o2p3 — o3p2}- 
Thus m, 4 0 and the orbital angular momentum is not a constant of the motion. 
This result is to be expected from the integrated equation of motion (29), 
the oscillatory part of the motion here displayed giving rise to an oscillatory term 


in the angular momentum. 
We have further iho, =0,H — Ho, 


= —cpi{oi(@, p) — (a, p)oi} 

= —cp|(010 — 001, Pp) 

= —2icpi{o3p2 — oops} 
with the help of equations (51) of §37. Hence ih(7m1 +$ho1) = 0, so that the vector 
m+ sho is a constant of the motion. This result one can interpret by saying 
the electron has a spin angular momentum 4ho, which must be added to the orbital 
angular momentum m before one gets a constant of the motion. The spin angular 
momentum could alternatively be obtained from the rotation operators for states 
of spin in accordance with the general method of §35. 

The same vector o fixes the directions of both the spin magnetic moment and 
the spin angular momentum. If an electron in a certain state of spin has a spin 
angular momentum of 4h in a particular direction, it will have a magnetic moment 
—eh/2mc in the same direction.* 


71. Transition to polar variables 
For the further study of the motion of an electron in a central field of force 
with the Hamiltonian (38), it is convenient to make a transformation to polar 


*[Extra paragraphs about particles other than electrons are included here in the fourth 
edition.] 
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coordinates, as was done in §38 in the non-relativistic case. We can introduce r and 
py as before, but instead of k the magnitude of the orbital angular momentum m, 
which is no longer a constant of the motion, we must now use the magnitude of 
the total angular momentum M = m + 4ho. Let us put 

jh? = MZ + My + Mg + 4h (39) 
The eigenvalues of m3 are integral multiples of h, those of $ho are +4h, and hence 
those of M3 must be half odd integral multiples of h. It follows from the theory of 
§36 that the eigenvalues of |j| must be integers greater than zero. 

If in formula (32) we take B= C = m, we get 
(o, m)? = m’ +i(o, m x m) 
= m? — h(o, m) 

= (m+ sho)? — 2h(o, m) — 2h’. 
Hence {(o, m) + h}? = M? + $h* 
Thus (o, m)+/h is a quantity whose square is M? + 4h? and we could, consistently 
with equation (39), define jh as (0, m)+h. This would not be the most convenient 
definition for 7, however, since we would like to have j a constant of the motion 
and (o, m) + fis not constant. We have, in fact, from applications of (32), 


(o, m)(9, p) = i(9, m x p) 


and (o, p)(o, m) =i(o, p x m), 

so that (0, m)(a, p)+(¢, p)(o, m) = i, T1{Map3 — MsP2 + PxM3 — p3M2} 
=i) 0,(2ih)p, = —2h(e, p), 

or {(o, m) + h}(o, p)+(o, p){(o, m) +h} =0. 


Thus (o, m) +h anticommutes with one of the terms in the expression (38) for H, 
namely the term —cpi(o, p), and commutes with the other two. It follows that 
p3{(o, m) + h} commutes with all the three terms in H and is a constant of 
the motion. But the square of p3{(o, m)+h} is also M?+4h2 We can therefore take 
jh = ps{(o, m) + h} (40) 
which gives us a convenient rational definition for 7 which is consistent with (39) 
and makes 7 a constant of the motion. The eigenvalues of this 7 are all positive 
and negative integers, excluding zero. 
By a further application of (32), we get (o, x)(o, p) = (x, p) + i(o, m) 
=rp,+ipsjh—ih, (41) 
with the help of (40) and also of equation (58) of §38. We introduce the linear 
operator € defined by re = p,(o, x). (42) 
Since r commutes with p;, and with (0, x), it must commute with e«. We thus have 


(oe = Plo SOx) aa ar 
1; 


or aa 
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Now pi(o, p) commutes with j, and since there is symmetry between x and p 
so far as angular momentum is concerned, p;(o, x) must also commute with j. 
Hence € commutes with 7. Further, € must commute with p,, since we have 

(a, x)(x, p) — (x, p)(a, x) = (a, x(x, p) — (x, p)x) = ih(@, x), 


which gives rerp, — Tprre = thre, 
or r-ep, — r*p,€é = 0. 
From (41) and (42) we obtain —repi(o, p) = rp, + ip3gh — th, 
or pila, p) = €(p, — th/r) + tepsgh/r. 


Thus (38) becomes H/c = —(e/c)Ao — €(p, — ih/r) — tepsgh/r — p3me. 

This gives our Hamiltonian expressed in terms of polar variables. It should be 
noticed that « and ps; commute with all the other variables occurring in H and 
anticommute with one another. This means that we can take a representation 
with p3 diagonal in which € and p3 are represented respectively by the matrices 


O34 1 0 
(3) 4) Wo 
If r is also diagonal in the representation, the representative (r’p | ) of a ket will 


have two components, (7/1 |) = wa(r’) and (r,—1 |) = wYy(7’) say, referring to 
the two rows and columns of the matrices (43). 


72. The fine-structure of the energy-levels of 


hydrogen 

We shall now take the case of the hydrogen atom, for which Ap = e/r, 
and work out its energy-levels, given by the eigenvalues H’ of H. The equation 
(H' — H)|H') =0 which defines these eigenvalues, when written in terms of 
representatives in the representation discussed above with € and p3 represented 
by the matrices (43), gives the equations 


H!' 2 QO 1 ih 
(=+ =] vo—h( +2) dy — dy + meve = 0, 
CG Or” F r 
5 ae QO 1 ih 
( )ostn( ! ) ea Pun — mevs =0. 
CGP OF r 
If we put h £2 h -_ Ad 
me+ H'/c me — H'/c ve ) 
these equations reduce to* it O p+1 
(Ley (212 yg 
a, oT Or ‘a 


(45) 


(- =) in (5 -2*)4.=0 
dg Or r 


where a = e?/hc, which is a small number. We shall solve these equations by 
a similar method to that used for equation (73) in §39. 


“lap in second term of second equation lhs replaced by wal 
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Put Wa = pee. Wp ~ rte ag: (46) 
introducing two new functions, f and g, of r, where 
a = (aya)? = h(m?c? — H”?/c?)-4 (47) 
Equations (45) become 7 . ree 
(+S) 1- (Gat eo=o 
| (48) 
1 oa O 1 j 
— gH —--+]|f=0. 
a2 oT OF <a FT 


We now try for a solution in which f and g are in the form of power series 
f= S- CH. f= ss Cr. (49) 


in which consecutive values of s differ by unity though these values need not 
be integers. Substituting these expressions for f and g in (48) and picking out 


coefficients of r*~', we obtain C.-1/a1 tac, —(s +f), +¢_1/a= _ (605 


—¢,1/@2 + ac, + (8 — jes — ¢s-1/a = 0. 


By multiplying the first of these equations by a and the second by a2 and adding, 
we eliminate both cs; and c,_,, since from (47) a/a, = a2/a. We are left with 

[aa + d2(s — 7)|cs + [a2za — a(s + 7)|c, = 0, (51) 
a relation which shows the connexion between the primed and unprimed c’s. 

The boundary condition at r = 0 requires that rw, and ry, > 0 as r > 0, 
so from (46) f and g > 0 as r > 0. Thus the series (49) must terminate on 
the side of small s. If so is the minimum value of s for which c, and c, do not 
both vanish, we obtain from (50), by putting s = so and c,,-1 = ¢. 0, 


So—l ws. 
acs, — (S80 + j)c,, = 0, 
eee (52) 
OC, (So — j)eso = 9, 
which give a? = —s2 + 7” 
Since the boundary condition requires that the minimum value of s shall be greater 
than zero, we must take 80 = +V(j? — 0”). 

To investigate the convergence of the series (49) we shall determine the ratio 
Cs/Cs-1 for large s. Equation (51) and the second of equations (50) give 
approximately, when s is large, ajc, = ac, 
and 8Cs = Cs_1/at C,_,/ao. 

Hence s/f Es. = 2/ a8: 

The series (49) will therefore converge like 

S> 1 2r é 
s!\a/’ 


s 


Sommerfeld’s 
formula 
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or e?"/¢ This result is similar to that obtained in §39 and allows us to infer, 
as in §39, that all values of H’ are permissible for which a is? imaginary, 
i.e. from (47), for which H’ > mc?, while for H’ < mc? we take a to be positive 
and then find that only those values of H’ are permissible for which the series (49) 
terminate on the side of large s. 

If the series (49) terminate with the terms c, and c,, so that c,,, = c,,, = 0, 
we obtain from (50) with s+ 1 substituted for s 


+ c/a =0, 
¢4/% c,/a (53) 
—Cc,/a2, —c,/a = 0. 


These two equations are equivalent on account of (47). When combined with (51), 


they give ay [aa + a9(s — j)] = alaga — a(s + 9)}, 
which reduces to 2a1d28 = a(az — a,)a, 
1/1 a H' 
OF a — i= =o: 
a 2\a, a ch 


with the help of (44). Squaring and using (47), we obtain 


(me _ Hye) = a? H” | 
-4 


Hence H' = 1% 
mc? s2 


The s here, which specifies the last term in the series, must be greater than so by 
some integer not less than zero. Calling this integer n, we have 


s=n+t V(j°— a") 


Zi 
2 


and thus cat = " | a be (54) 
me {n+ VG? —o)P 

This formula gives the discrete energy-levels of the hydrogen spectrum and was 
first obtained by Arnold Sommerfeld working with Bohr’s orbit theory. There are 
two quantum numbers n and j involved, but owing to a? being very small 
the energy depends almost entirely on n+|j|. Values of n and |j| that give the same 
n + |j| give rise to a set, of energy-levels lying very close to one another, and to 
the energy-level given by the non-relativistic formula (80) of §39 with s =n + |j|, 
apart from the constant term mc? 

We used equations (53) by combining them with (51), but this does not make 
full use of (53) since the coefficients of c, and c, in (51) may both vanish. In this 
case we get, multiplying the first coefficient by a and the second by az and adding, 

(a* + a3)a + 2aagj = 0. 
With the help of (44) and (47) this gives 
(a1 + ag)a = 2aj 


‘pure’ omitted. 
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oe 2) ay a2 asa 2mca 2mc 
@ @ a @ QQ h (m?2c? — H!?/c?)2’ 
or H” a 
ee pe 


Since H’ must be positive, this leads to 
/ (2 2 
oe NE Se (55) 
me ls 
which is the value of H’ given by (54) when n = 0. The case n = 0 thus needs 
further investigation to see whether the conditions (53) are then fulfilled. 

With n = 0, the maximum value of s is the same as the minimum, so equations 
(53) with so substituted for s should agree with (52). Now (55) gives, from (44) 
and (47), 1 me (1 | ee) 1 mea 

aA | li a A (jl 
so the first of equations (53) with so substituted for s gives 


Cso {|j| a Vi? 7 a”)} as G50 = 0. 


This agrees with the second of equations (52) provided 7 is negative. We can 
conclude that, for n = 0, 7 must be a negative’ integer, while for the other values 
of n all non-zero integral values of 7 are allowed. 


73. Theory of the positron 

It has been mentioned in §67 that the wave equation for the electron admits 
of twice as many solutions as it ought to, half of them referring to states with 
negative values for the kinetic energy cpp + eAo. This difficulty was introduced 
as soon as we passed from equation (3) to equation (4) and is inherent in any 
relativistic theory. It occurs also in classical relativistic theory, but is not then 
serious since, owing to the continuity in the variation of all classical dynamical 
variables, if the kinetic energy cpp+eApo is initially positive (when it must be greater 
than or equal to mc’), it cannot subsequently be negative (when it would have to 
be less than or equal to —mc?). In the quantum theory, however, discontinuous 
transitions may take place, so that if the electron is initially in a state of positive 
kinetic energy it may make a transition to a state of negative kinetic energy. It is 
therefore no longer permissible simply to ignore the negative-energy states, as one 
can do in the classical theory. 

Let us examine the negative-energy solutions of the equation 


e e e e 
{bo+ £40) +01 (v =) + Og @ “4, +a @ “A, ane} a =0 (56) 


t[Contrast with ‘j must be a positive integer’ in the fourth edition.] 


positron 
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a little more closely. For this purpose it is convenient to use a representation of 
the a’s in which all the elements of the matrices representing a1, a2 and a3 are real 
and all those of the matrix representing a,, are? imaginary. Such a representation 
may be obtained, for instance, from that of 867 by interchanging the expressions 
for a2 and a,, in (7). If equation (56) is expressed as a matrix equation in 
this representation and we put —7 for 7 in all the matrix elements, we get, 
remembering the (1) and (2), the matrix form of the equation 


{C+ $4} +-0: (-n+£Ai} 0s (pot £ As) -09 (p+£As)anme} ee 0 
Cc Cc Cc Cc (57) 


where |z*) is the ket whose representative is the conjugate complex 
of the representative |.). Thus each solution |2) of (56) determines 
uniquely a solution |z*) of (57) with the conjugate complex representative. 
Further, if the solution |x) of (56) belongs to a negative value for cpp + eAo, 
the corresponding solution |x*) of (57) will belong to a positive value for cpp — eAo. 
But equation (57) is just what one would get if one substituted —e for e 
in (56). It follows that each negative-energy solution of (56) corresponds to 
a positive-energy solution of the wave equation obtained from (56) by substitution 
of —e for e, which solution represents an electron of charge +e (instead of —e, 
as we had up to the present) moving through the given electromagnetic field. 
Thus the unwanted solutions of (56) are connected with the motion of an electron 
with a charge +e. (It is not possible, of course, with an arbitrary electromagnetic 
field, to separate the solutions of (56) definitely into those referring to positive and 
those referring to negative values for cpp + eAg, as such a separation would imply 
that transitions from one kind to the other do not occur. The preceding discussion 
is therefore only a rough one, applying to the case when such a separation is 
approximately possible. ) 

In this way we are led to infer that the negative-energy solutions of (56) 
refer to the motion of a new kind of particle having the mass of an electron 
and the opposite charge. Such particles have been observed experimentally and 
are called positrons. We cannot, however, simply assert that the negative-energy 
solutions represent positrons, as this would make the dynamical relations all wrong. 
For instance, it is certainly not true that a positron has a negative kinetic 
energy. We must therefore establish the theory of the positrons on a somewhat 
different footing. We assume that nearly all the negative-energy states are occupied, 
with one electron in each state in accordance with the exclusion principle of Pauli. 
An unoccupied negative-energy state will now appear as something with a positive 
energy, since to make it disappear, i.e. to fill it up, we should have to add to it 
an electron with negative energy. We assume that these unoccupied negative-energy 
states are the positrons. 


‘pure’ omitted. 
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These assumptions require there to be a distribution of electrons of infinite 
density everywhere in the world. A perfect vacuum is a region where all the states 
of positive energy are unoccupied and all those of negative energy are occupied. 
In a perfect vacuum Maxwell’s equation div@ = 0 


must, of course, be valid. This means that the infinite distribution of 
negative-energy electrons does not contribute to the electric field. Only departures 
from the distribution in a vacuum will contribute to the electric density p in 
Maxwell’s equation div & = 4rp. 

Thus there will be a contribution —e for each occupied state of positive energy 
and a contribution +e for each unoccupied state of negative energy. 

The exclusion principle will operate to prevent a positive-energy electron 
ordinarily from making transitions to states of negative energy. It will still 
be possible, however, for such an electron to drop into an unoccupied state 
of negative energy. In this case we should have an electron and _ positron 
disappearing simultaneously, their energy being emitted in the form of radiation. 
The converse process would consist in the creation of an electron and a positron 
from electromagnetic radiation. 

The theory of the positron given here appears at first sight to treat the electrons 
and positrons on very different footings, but actually the fundamental ideas of 
the theory are symmetrical between the electrons and positrons. We should 
have an equivalent theory if we supposed the positrons to be the basic particles, 
described by wave equations of the form (9) with —e for e, and then supposed that 
nearly all the states of negative energy for the positrons are filled up, a hole in 
the distribution of negative-energy positrons being then interpreted as an ordinary 
electron. The theory could be developed consistently with the hypothesis that all 
the laws of physics are symmetrical between positive and negative electric charge. 


covariant 


contravariant 
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74. Relativistic notation 

IN 863 a theory was given of the interaction of an atom with a field of radiation. 
This theory was an approximate one, valid for radiation of long wave-length and 
for a certain simplified model of the atom. Our present problem is to improve 
this theory, and in particular to make it relativistic, so that it may be applied 
to particles moving at high speed. We must first set up a notation suitable for 
handling the relativistic equations with which we shall have to deal. 

We choose units of space and time which make the velocity of light unity, 
so that c will no longer appear in our equations. A point in space-time is located 
by its three Cartesian coordinates 71, %2,73 and its time t = Xo, which together 
form a 4-vector x, (uw = 0,1,2,3), or x as we may write it in vector notation. 
Two 4-vectors a and b have a Lorentz-invariant scalar product (ab) given by 


(ab) => aobo = a0 1 = Agb = a3b3 = Apdo = (ab), (1) 
(ab) being the three-dimensional scalar product of the three-dimensional parts of 
a and b.' To take into account the — signs in (1), it is convenient to introduce 


vector components with raised suffixes, defined by 


0 ie, 2 Beat 
C=]. @SSa S05, (C= Gs, (2) 


so that the scalar product (ab) may be written 
(ab) = ab, = a,b", (3) 
a summation being implied over a repeated (letter) suffix in a term. 
The components a” are called the covariant components of the 4-vector a, 
the original components a, which transform like the four coordinates x,, of a point 
in space-time, being called the contravariant components. 
The fundamental tensor g,,, is defined by 
goo = 1, gu = 922 = 933 = —1, 
s (4) 
Gux= 0 for jer wv, 


*lThis chapter’s name is the only resemblance with the same named chapter in the 4th 
edition.] 
‘lwhich is written in the rest of the text as (a, b)| 
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With its help we can write the rule (2) connecting the covariant and contravariant 
components of a vector GHG 2G) 
and we can write the scalar product (ab) as 

(ab) = g,,a"b”. 

The operators 0/0x, form the covariant components of a 4-vector, 
and the contravariant components of the vector are written 0/Ox". Equations (1) 
and (2) of §66 may be written 

0 0 
— ee or Py = th=— (5) 
and show how the momentum-energy 4-vector of a particle is related to 
the operator of differentiation applied to the wave function. 

The function 6(xx) is evidently Lorentz invariant. It vanishes everywhere 
except on the light-cone with the origin as vertex, i.e. the three-dimensional 
space (xx) = 0. This light-cone consists of two distinct parts, a future part, 
for which x) > O, and a past part, for which x < 0. The function which 
equals 6(xx) on the future part of the light-cone and —d(xx) on the past part of 
the light-cone is also Lorentz invariant. This function, which equals 6(xx)z9/ |zo|, 
plays an important role in the dynamical theory of fields, so we introduce a special 
notation for it. We define 


A(x) = 26(xx)29/ |xo] . (6) 
This definition gives a meaning to the function A applied to any 4-vector. 
With the help of (1) and of (9) of §15, we can express 6(xx) in the form 


5(xx) = 4|x|'{4(xo — |x|) + (ao + [xI)}, (7) 
|x| being the length of the three-dimensional part of x, and then A(x) takes 
he fe - 
ses A(x) = |x|" {8(o = |x|) — 8(o + [x1)}- (8) 


A(x) is defined to have the value zero at the origin, and evidentlyA(—x) = —A(x). 
Let. us make a Fourier analysis of A(x). Using d*x to denote dapdx,dx2d723, 
and d°x to denote dx,;dx2dx3 we have, for any 4-vector k, 


[Aco d*x = i Ix]? {5(ao — |x}) — 6(xo + |x|) peter 79] gtx 
= ‘ Ix} {etbolx fer e~ thobxl} gale, x) Fie 


By introducing polar coordinates |x|, 0, ¢ in the three-dimensional x; x273 space, 
with the direction of the three-dimensional part of k as pole, we get 
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[dco d‘x = [ffm — e~tholxlt e—alkllx! 0089 |x! sin 9 dO dob d|x| 
= an f { etkolx! _ e~ tkolxl} d|x| | e kIIx| cos 6 |x| sin 6 d6 
0 0 
= 2rri ||? | * geikabel — gm ikolely dag] fe-tlkllxl — ilelly 
0 


= 21 |k|7" i. feilko—lk))e “ el(FotTk|)a4 da 


= 4n7i |k|~' {8(ko — ||) — 5(ko + [kI)} 

= 4n7iA(k). (9) 
Thus the Fourier analysis gives the same function again, with the coefficient 4777. 
Interchanging k and x in (9), we get 


A(x) = —(i/4r”) [Ato d*k. (10) 


Some of the important properties of A(x) can easily be deduced from its Fourier 
resolution. In the first place equation (10) shows that A(x) can be resolved into 
waves all travelling with the velocity of light. To get an equation for this result 
we apply the operator 0 to both sides of (10), thus 


DA(x) = —(4/417) f A(k) De®) d4k = (i/41?) ‘ (kk)A(k)e"(®) d4k. 


Now (kk)A(k) = 0, and hence OA(x) =0. (11) 
This equation holds throughout space-time. We can give a meaning to DA(x) 
at a point where A(x) is singular by taking the integral of DA(x) over 
a small four-dimensional space surrounding the point and transforming it to 
a three-dimensional surface integral by Gauss’s theorem. Equation (11) informs 
us that the three-dimensional surface integral always vanishes. 

The function A(x) vanishes all over the three-dimensional surface x) = 0. 
Let us determine the value of 0A(x)/Ozxo9 on this surface. It evidently vanishes 
everywhere except at the point x, = x2 = x3 = 0, where it has a singularity which 
can be evaluated as follows. Differentiating both sides of (10) with respect to 2o, 
we get 


OA(x)/Ox9 = (1/477) / koA(k)e@™ d*k 
- a/4n?) | i Ikk|~* {6( ho — [kk|) — 5(ko + [k|) fe) a4 


= (1/47”) [0% — |k|) + 6(ko + |k|)}e@? d*k 


Putting x) = 0 on both sides here, we get 
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[2A(60)/Beo]ayno = (A/4n2) f {8(ko + 5(ky + I) Je" 


= (1/2n") i ee) ak 


= 476(21)d(#2)d(a3). (12) 
Thus the ordinary 0 singularity, with the coefficient 47, appears at the point 
HH %] = %Z = 0. 


75. The quantum conditions for the field 

In §63 a theory of a field of radiation without interaction with matter 
was first developed and the interaction was taken into account subsequently. 
In the theory without interaction dynamical variables were introduced to describe 
the field, commutation relations were established for these dynamical variables, 
and a Hamiltonian was set up which made the dynamical variables vary correctly 
with the time. No approximations were made in this work. The theory would 
therefore be a quite satisfactory, exact theory of radiation without interaction with 
matter, were it not for one feature in it, namely our taking the scalar potential 
to be zero at the outset. This feature spoils the relativistic form of the theory 
and makes it unsuitable as a starting-point from which to develop an accurate 
theory of radiation in interaction with matter. We shall here consider how to put 
the theory of radiation without interaction with matter into relativistic form. 

We leave the scalar potential Ag arbitrary and it then forms, together with 
the vector potential A;, Aj, A3, a 4-vector A,. The Maxwell equations (62) of §63 
must then be generalized to OA, =0, OAy/ Ot, 0. (13) 
For the present we shall ignore the second of these equations and work only from 
the first. This equation shows that each A,, can be resolved into waves travelling 
with the velocity of light, so that its Fourier resolution is of the form 


A,(x) = 2 : 6(kk) Ay,e"™ d*k, (14) 


x denoting a general point in space-time. The factor (kk) here ensures that 
the integrand vanishes except for those values of the 4-vector k which satisfy 
(kk) = 0, and the coefficient Ay,,, may be considered as undefined except when 
(kk) = 0. Since A,,(x) is real, we must have A_,, = Ax, so (14) may also 
be written A, (x) = 2 | 5(kk){ Aye) + Aye} dk, (15) 
ko>0 
With the help of formula (7) applied to k, this goes over into 
Aux) = fil? 8(ko — Hel) {Annet + Ase} dl 


ko>0O 


= | {A,,e + A, e M™ ka! dk 16 
LL im 0 


Maxwell’s 
equations 
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where it is implied in the last integrand that ko = |k|, i.e. that k is a 4-vector lying 
on the future part of the light-cone. 

Equation (16) is usually the most convenient form in which to give 
the Fourier resolution of A,. For « = 1,2,3 it agrees with (63) of §63, 
except for the factor kj’ in (16). This factor is a desirable one to have 
in a relativistic theory, since the product ko'd°k gives a Lorentz invariant 
element on the light-cone (kk) = 0. The Lorentz invariance can be proved by 
direct geometrical methods, and can also be inferred from the above analysis, 
it being evident that the coefficient Ax, introduced by (14) is a 4-vector for each 
value of k on the light-cone, so that the factor {} in (16) is also a 4-vector, 
and hence the remaining factors on the right-hand side of (16) must form 
a four-dimensional scalar. 

The quantities A, and OA,/Oxo for all x1, v2, x3 at a given time x =t 
are sufficient, with the help of the first of equations (13), to determine 
the potentials throughout space-time, so these quantities may be considered 
as the dynamical variables describing the field of radiation considered as 
a dynamical system. (They are the ordinary dynamical variables of the classical 
theory, or the Heisenberg dynamical variables of the quantum theory.) 
Define the quantities Ax, for ko > 0 by 

Akt = Ayye'*?*®. (17) 

Then Ae [Avec + Ate bho! dk, 
(18) 

OA,(x)/Ox0 = if {Ameo — Axe’ } d°k, 


ko being understood as equal to |k| in the integrands here. These equations express 
A,, and 0A,,/0zo at time t as functions of Ay,,; and Ax, not involving t explicitly. 
By reversing the three-dimensional Fourier analysis of equations (18) we can get 
Ak, and Aleut as functions of A,, and 0A,,/Oxpo at time t not involving t explicitly. 
Hence we may take Ax, and Aleut for all yz and all k with ko > 0 as the dynamical 
variables describing the system, instead of A, and OA,,/Oxo at time t. 

We must now determine the quantum conditions for the Ax, and Algut- 
In the first place, variables referring to different values of k or of w belong to 
different degrees of freedom and therefore commute. We can get information about 
the quantum conditions for variables referring to the same value of k and yz from 
the work of 863. To connect up with this work, we pass over to discrete k-values 
in three-dimensional k-space. Equation (73) of §63 gives, on taking into account 
that the present A, variables are ko times those of 863, 


i i 
27 Ait = hPk3 mast (19) 
Let us consider one particular discrete k-value for which k, = ky = 0, k3 = ko > 0. 
Then the polarization variable 1 can take on two values referring to the two 
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directions 1 and 2, so equation (19) gives, with the help of the commutation 
relations for the 7’s and 77’s, equations (11) of §60, 
AxitAnit — AritArait = Akos, /4n°, 
Axor Anz — ArotAkat = hkosy/4n* 
With the help of (17), these equations may be written in terms of the Ax,,, Ax, 
for ko >0 AAs = Ax Ag = hkos,/4n*, 
Aro An — Aro An = hkos_/47”. 
The work of §63 gives us no information about Ay,g and Axo. 
However, we can now obtain the quantum conditions for A,3 and Ayo from 
the theory of relativity. Equations (21) have to be built up into a relativistic set 
of equations and the only simple way of doing so is by adding to them the two 
further equations Ayg Arg — Aug Aug = hikos,/4x?, 
AxoAxo — Axo Ako = —hkos_/477 
Note the opposite sign in the last of these equations. The four equations (21) 
and (22), together with the conditions that Ay, and Ay, commute for up 4 v, 
can then be written as a single tensor equation 
Pai Aig = Ags Agi = —Guvhkos,/4n. (23) 
We get in this way the quantum conditions for all the dynamical variables. 
Equation (23) can be extended to 
AigiAns cae Alety Alen _ —(guv/40) kos Ox’ - (24) 
Let us now return to continuous k-values. To convert 6, to continuous 
k-values we note that, for a general function f(k) in three-dimensional k-space, 


S F6)5uw = Fe) = fe alke =) a, (25) 


(20) 


(21) 


(22) 


where 63(k — k’) is the three-dimensional 6 function 

d3(k — k’) = d(ky — ky )5 (ka — k))5(k3 — ky) 
In order that (25) may conform to the standard formula connecting sums and 
integrals, equation (52) of §62, we must have 


SOK! = 63(k — k’). 26 
Thus (24) goes over to ao ) ey 


Ax Ay — Arr Aky = —(Gur/407) hkod3(k — k’). (27) 
This equation, together with the equations 
AguAny — AvAky = | 
AyuAry — AnvAky = 90, 
provide the quantum conditions for the field quantities in the theory with 


continuous k-values. We have here the formalism which must be used instead 
of (11) of 860 for dealing with a set of oscillators whose number is a continuous 


(28) 
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infinity, equal to the number of points in a volume. The number of degrees of 
freedom of the system is a continuous infinity, and the 6 function appears in 
the commutation relations instead of the two-suffix 6 symbol. 

The quantum conditions for the field may also be expressed in terms of 
the potentials A,,(x) at different points x in space-time. We have from (16), 
(27) and (28) 

[A,(x), Av(x’)] 


= [fret + Aye 9, Age? 4 Aue (Eh d°k d?k’ 
-_ (igu/4n?) / fer sled oie) oz eli) eM) 18 (Ik = k’)k’5* d°k d?k’ 


= (i9ua/4°) i ie aa Le (29) 


This three-dimensional k-integral is easily seen to be equal to the four-dimensional 
k-integral over the whole of four-dimensional k-space 


Gone /4a2) f aI" {50 — Il) ~ 5(Ra + II }eMA*? 
= (igu/4n?) ff Atgent*™ d*k. 
Evaluating this integral with the help of formula (10), we get finally 
[Au(x), Av(x')] = du A(x — x’). (30) 
We see that potentials at two points in space-time always commute unless the line 
joining the two points is a null-line (i.e. the track of a light-ray). 

Let us determine the quantum conditions for the quantities A, and 0A,,/Oxo 
for various £1, %2, 73 at a given time x = t. Using the suffix t to denote a quantity 
taken at the time x = t, we have, putting 7) = 2) =t 

[Aye(x), Ave(x’)] = 0. (31) 
Differentiating (30) with respect to vo and then putting 2p = 2 = t, we get 
BAO) ev. 9 OA(x - x) | 
EE he Ab) | a = Aer gy, 63(x — 32 
fae ate) = oe | gaa MBpeiloe—x!) (82) 
from (12). Finally, differentiating (30) with respect to x and zp and then putting 
Lo = Lo = t, we get i an, . a \] if (33) 
Oxo J, One. ‘Ni 
since 0? A(x) /0x% = 0 for zp = 0. We can, as stated on p. 234, take the quantities 
A,(x) and {0A,,(x)/Oxo}:, as the dynamical variables describing the system, 
and equations (31), (32) and (33) are then the quantum conditions for these 


dynamical variables. From the form of these quantum conditions we see that, 
apart from numerical coefficients, the A,.(x)’s can be looked upon as a set of 
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coordinates g and the {0A,,(x)/0zo},’s as their conjugate momenta p, there being 
a 6 function on the right-hand side of (32) instead of a two-suffix 6 symbol on 
account of the number of these q’s and p’s being a continuous infinity. The quantum 
conditions (31), (32), (33) still hold if the radiation is in interaction with matter, 
and indeed in all Lorentz frames of reference, but the more general condition 
(30) need not then hold, since the commutation relations connecting dynamical 
variables at different times in general get altered by interaction. 

The electric and magnetic fields @ and form in relativistic notation 
a 6-vector Puy = —fin, 


é, = Fo, 65 = Fry, 63 = Fo, \ (34) 
HK, = Fo, Hy = F 3, H3 = Fy. 
The equations connecting €& and @ with the potentials may be written in 
tensor form _ OAy OA, (35) 


mw Ox Ox 
The quantum conditions connecting €& and # at different points in space-time 
can be obtained immediately from (35) and (30). 


76. The Hamiltonian for the field 
The Hamiltonian for the field, Hg say, must be chosen so as to give the correct 
Heisenberg equations of motion for the dynamical variables. This suffices to fix it, 
except for an arbitrary constant. From (17), the dynamical variables Ay, vary 
with ¢ or 2 according to the law 
Arye / dt = iko Arye. 

Thus from the Heisenberg equations of motion 

thd Ax / dt = AxwHre = ARAut 
we get —hko Ateut = Axwr = HrAxut- (36) 
We must choose Hp to satisfy these conditions. 

Let us pass over to discrete k-values and consider again one particular k-value 
for which k, = ky = 0, k3 = ky > 0. We then have the commutation relations (20), 
which show us that, so far as concerns the degrees of freedom Axi; and Axgz, 
Hr must consist of the terms 


An? (Art Anite + Axor Aat) Sx» (37) 
as these terms substituted for Hr in the right-hand side of (36) make it 
equal the left-hand side. These terms are in agreement with (72) of §63, 
if one takes into account that the A,’s there differ from the present ones by 
the factor ko. For the degrees of freedom As; and Axor, we have, from (22), 
the commutation relations 

Ax3t Anse — Axzt Anse = hkos_/4m : 
Axor Aor — Arcot Axor = —hkos,/ An”. 


(38) 


longitudinal 
degrees of freedom 
transverse degrees 
of freedom 
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which show similarly that Hp contains the terms 
4n? (Ax3:Ans: — AxorArot) Sx 5 
It is convenient to change this by a constant and to take instead the terms 
4? (Ay3:Ans: — Axor Axor) Sx 5 (39) 


as it will be found later that (39) gives no zero-point energy to Hr. The total 
Hamiltonian is now 


Hp = 4n? S- (Ani Ai + Ayo An + Axg Ang — Axo Axo) 8, (40) 
k 
Sie [AoA + Ayo Ayo + Ax3Axu3 — AxoAxo) ak. (41) 


if we pass back to continuous k-values. This Hp gives, according to (17) of §29, 
etRt/h Aye tRt/h = Abut = Ane. (42) 


We may call ‘longitudinal degrees of freedom’ the degrees of freedom 
associated with the variables Ayo, and A,3; for the particular k-value considered 
above, in contradistinction to the ‘transverse degrees of freedom’ associated with 
the variables Axi; and Axo. For a general k-value Ax3; is to be replaced by Axe, 
« being a unit three-dimensional vector in the direction of the three-dimensional 
part of k. The longitudinal degrees of freedom do not occur in the theory 
of §63, Axo and Ay, there being zero. The present Hamiltonian (40) differs from 
the Hamiltonian (63) of §63 by the terms referring to the longitudinal degrees 
of freedom, these terms being needed now to make Axo: and Axge vary correctly 
with t. 

We see from (39) that the contribution of the degree of freedom Ayo; to 
the Hamiltonian is negative. This means that the dynamical system formed 
by the variables Axo:, Axor, is a harmonic oscillator of negative energy. It is 
rather surprising that such an unphysical idea as negative energy should appear in 
the theory in this way. The negative energy is a necessary consequence of the—sign 
on the right-hand side of the second of equations (38) and this — sign is demanded 
by relativity. We shall see in the next section that the negative energy associated 
with the degree of freedom Axo; is always compensated by the positive energy 
associated with the corresponding longitudinal degree of freedom* Axx, so that it 
never shows up in practice. 

The theory of a harmonic oscillator of negative energy may be built up in 
the same way as that of an ordinary harmonic oscillator given in 834. Expressing 
the Axo: of the second of equations (38) in terms of 7 by means of 


I Ayo, = hi? ke s?n, 


“lOriginal:- Agnt| 
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we have 7 satisfying the same commutation relation with 7 as in §34, and the energy 
in this degree of freedom is —hkon7, from (39). The work of §34 now shows that 
the maximum eigenvalue of the energy is zero, the other eigenvalues being negative 
integral multiples of hko. Introducing the normalized eigenket of the energy 
belonging to the eigenvalue zero as the standard ket |0), we have 
77|0) = 0 

as in §34, and 7"|0) with n a positive integer is the ket corresponding to 
the nth quantum state, which has the energy —nhko. Any ket can be expressed as 
a power series in 7 multiplied into |0). 

For the whole field of radiation we can introduce a standard ket ),, for which 
there is zero energy in each degree of freedom. Any state of the field of radiation 
then corresponds to a ket of the form of a power series in the various 1-variables 
multiplied into ),. We can replace the power series in the 7-variables by a power 
series in the Fourier coefficients Axi, Axo, Ang, Axo. The different terms in 
the power series correspond to different degrees of excitation of the various Fourier 
components of the field. Alternatively, they correspond to different numbers of 
photons present in the various stationary states of a photon, there being now 
longitudinal photons associated with the longitudinal degrees of freedom, as well 
as the usual transverse ones. (The physical significance of the longitudinal photons 
will become clear later, see p. 256.) If we are working with continuous k-values, 
the power series in Ay, Axo, Ax, Ayo becomes a sum of integrals of degree 0, 1, 
2, 3,... in these variables. Any of the linear operators Ay, Ag, Ax3, Axo applied 
to )» gives zero. 


77. The supplementary conditions 
We must now go back to the second of the Maxwell equations (13), which we have 
ignored so far. We cannot take this equation over directly into the quantum 
theory without getting inconsistencies. The left-hand side of this equation does not 
commute with A,(x’), according to the quantum conditions (30), so this left-hand 
side cannot vanish. The way out of the difficulty was shown by Enrico Fermi.? 
It consists in adopting a less stringent equation, namely the equation 

(0A,,/0x,,) |) = 0, (43) 
and assuming it to hold for any |) corresponding to a state that can actually 
occur in nature. There is one equation (43) for each point in space-time and these 
equations must all hold for any ket corresponding to a state that can actually occur. 
The ket in (43) does not depend on t, since we are using the Heisenberg picture, 
in which each state corresponds to a fixed ket. 


'Fermi, E. (1932). ‘Quantum Theory of Radiation.’ Reviews of Modern Physics, 4(1), 
pp. 87-132. {Especially from page 125} doi: 10.1103 /revmodphys.4.87 
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We shall call a condition such as (43), which a ket has to satisfy to correspond 
to an actual state, a supplementary condition. The existence of supplementary 
conditions in the theory does not mean any departure from or modification in 
the general principles of quantum mechanics. The principle of superposition 
of states and the whole of the general theory of states, dynamical variables, 
and observables, as given in Chapter II, apply also when there are supplementary 
conditions, provided we impose a further requirement on a linear operator in order 
that it may represent an observable, namely the requirement that, when it operates 
on any ket satisfying the supplementary conditions, it changes this ket into 
another ket satisfying the supplementary conditions. We have already had 
an example of supplementary conditions in the theory of systems containing 
several similar particles. The condition that only symmetrical wave functions, 
or only antisymmetrical wave functions, represent states that can actually occur 
in nature, is precisely of the same type as condition (43) and is what we are 
now calling a supplementary condition. In this theory the further requirement on 
linear operators in order that they shall represent observables is that they shall be 
symmetrical between the similar particles. 

When we introduce supplementary conditions into our theory we must verify 
that they are not too restrictive to allow any ket at all to satisfy them. If we have 
more than one supplementary condition, we can deduce further supplementary 
conditions from them by taking P.B.s of the operators in them; thus if we have 

U |) = 0, V)) = 0, (44) 
we can deduce [U,V] |) =9, CV) =o; (45) 
and so on. To verify that our supplementary conditions are not too restrictive, 
we have to look into all the further supplementary conditions obtainable by this 
procedure to see that they can be satisfied, which we can usually do by showing that 
after a certain point the further supplementary conditions are all either identically 
satisfied or repetitions of the previous ones. 

To apply this procedure to the supplementary conditions (43), we work out 
the P.B. of two of the linear operators 0A,,/Ox,,, say those at the points x and x’ 
in space-time. We have from (30) 


OA,(x) OA,&')) PAK-»x’) | 7 0? A(x — x’) 
Ot5. 0%, Oe Oat. ae OF ORs, 
= — DA(x—- x’) =0 


from (11). Thus the conditions (45) are all identically satisfied, so our 
supplementary conditions are not too restrictive. 

We should verify also that the supplementary conditions are consistent with 
the equations of motion, in the present case with the first of equations (13). 
This consistency is immediately evident in the quantum theory, as in 
the classical theory. 
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Since the second of equations (13) is not valid and has to be replaced by 
a supplementary condition, any consequences of this equation in the ordinary 
Maxwell theory will not be valid in the quantum theory and will have to be replaced 
by supplementary conditions. The equations 


div#” = 0, OF /Ot = —curl€ (46) 
follow simply from the equations defining & and # in terms of the potentials, 


namely (35), and are therefore valid also in the quantum theory. The other 
Maxwell equations for empty space, however, namely 


divé = 0, O€ /Ot = curl#?, 
or OF yf OL, =, 
can be derived only with the help of the second of equations (13), as one sees at 
once if one substitutes for F,,, its value given by (35), and are thus not valid in 
the quantum theory. They must be replaced by 
{div€} |) = 0, {0€ /ot — curl#@}|) = 0, (47) 

holding for any |) corresponding to a state that can actually occur. 

The field quantities & and # at any point in space-time commute with all 
the operators in the supplementary conditions, since from (35) and (30) 


Ful), oa] - ee _ OA,(x) OAx(X’) 


Out On = 70x 
>, CAK= x) 7 OPA(x—x’) OP A(x — x’) 7 0? A(x — x’) 5 
~ Sanaa ava Cah!” durou™ 


It follows that if @ or # is multiplied into a ket satisfying the supplementary 
conditions, it will give another ket satisfying the supplementary conditions, 
and hence it fulfils the new requirement for being an observable. The potentials 
do not satisfy this requirement. 

By making a Fourier resolution of the left-hand side of equation (43) we get 
the equations kite: |). = 0, RA y= 0 (48) 
holding for all values of the 4-vector k with kk, = 0 and ko > 0. This is another 
form for the supplementary conditions. The P.B. of the operators k“ Ay, and k" Aj, 
here, of course, vanishes, as may be verified directly from (23) or (27). 

To examine the consequences of equations (48), let us work with discrete 
k-values and consider first one particular k-value for which k, = ky = 0, 
kz = kg > 0, as we have done on previous occasions. For this k-value equations 
(48) become (Axo = Ax3) |) = 0, (Axo — Axs) = 0. (49) 
Multiplying the first of these on the left by (Axo + Ax3) and the second by 
(Axo + Axg) and adding, we get 


(Axo Axo + AxoAxo — Ak3Au3 — Ax3-Axs3) |) = 0 
or 2(Axo Axo — Ax3Ax3) |) = 0 


Maxwell’s 
equations 
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with the help of (22). This shows that the energy in the two longitudinal degrees 

of freedom for this k-value, namely expression (39), vanishes for any state that 

occurs in nature. The same result holds for all k-values. Thus the supplementary 

conditions ensure that the negative energy in any Ayo. degree of freedom is always 

exactly cancelled by the positive energy in the corresponding Axx degree of freedom. 
Let us express the |) in (48) in the form 


= %) p> 
where ), is the standard ket for the field of radiation introduced in 
the preceding section, corresponding to zero energy in each degree of freedom, 
and ~ is a power series in the operators Ay, Ax, Ax3, Axo: Since 
Axo) p = Ax3)e =9, we get from (49), for the k-value to which these 
equations refer, i = = 
(Axo — VAxo) )p = Ans ¥) ps (Ax3 — Aks) ) ep = Axo Y) p- 

With the help of the commutation relations (22), these equations reduce to 

hkos, O hkos, O = 

pe r= Aa We, EE p= Ao Vp. 
showing that ~ is of the form y = e417 4k04ks/Phosty), 
where 7, is independent of Ayo and Ay3. Applying this argument to all k-values, 


we find that w is of the form yw = e147 Ux AkoAkn/hhosky, (50) 
where x involves only the transverse components of Ay. In terms of continuous 
k-values ay = eb (Aro Aten /ftko) ay, (51) 


We see in this way that the supplementary conditions fix the form of 
the wave function w so far as concerns the longitudinal degrees of freedom. 
Thus the longitudinal degrees of freedom cannot play any important role in 
the dynamical theory. This corresponds to their not being of physical importance. 
Their only purpose is to give the theory a relativistic setting. The important part 
of w is the factor y referring to the transverse degrees of freedom. This factor is the 
same as the wave function in the theory of a field of radiation without interaction 
with matter given on pp. 200ff. 


78. Classical electrodynamics in Hamiltonian form* 
The foregoing theory must now be extended to take into account the interaction 
of the field of radiation with matter. This involves dealing with the dynamical 
system composed of a number of charged particles interacting with the 
electromagnetic field. Let us first consider this dynamical system classically and 
see how to put its equations of motion into Hamiltonian form. We shall then have 
a basis from which to build up a quantum theory by analogy. 


*[A four dimensional scalar product is introduced that uses emboldened round brackets and 
a comma between the arguments.| 
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Each of the charged particles will describe a world-line in space-time in 
the classical theory. We give the particles labels 7, 7,... and denote the coordinates 
of a point on the world-line of the ith particle by z,;. These coordinates are 
functions of the proper-time s; of the 7th particle, this proper-time being defined 
so that its difference for two neighbouring points on the world-line satisfies 


(ds;)* = (dz, dz;), dzo;/ds; > 0. (52) 
The velocity 4-vector v; of the ith particle is defined by 
v; = dz;/ds; (53) 
and satisfies from (52) (vi, vi) =1, Uo; > 0. (54) 
The presence of charges changes the Maxwell equations (13) to 
Be. = 475 4; O81 OF = 0, (55) 


where j,, is the 4-vector whose time component is the charge-density and whose 
space components are the current density. For mathematical simplicity we suppose 
the charge on each particle to be concentrated at one point. Then j, vanishes 
everywhere except on the world-lines of the particles, where it has singularities 
which can be described in terms of 6 functions. The solution of (55) can be 


written in the form = Ania + oe Di vers (56) 


where A,,in are the potentials of the incoming field of radiation which acts 
on the particles and %, re are the retarded potentials of the 7th particle, 
the summation in (56) being over all the particles. The potentials A,,in satisfy 
the equations for no charges, equations (13), and the &%,i ret are given by 


PB yires = ey / (Vi, x — Zi), (57) 
e; being the charge of the ith particle, and the variables v;, z; in (57) being taken 
at the retarded proper-time s; of the zth particle, for which 


(x — 2;, xX — 2) = 0, Loi — Zoi > 0. (58) 
As the equations of motion for the ith particle, we shall take Lorentz’s equations 
dv i 
Mg 7s, = EU; i pv,in + So Fy pv j,ret +3 5F pvi,ret — Amat (59) 
: j#i 


m, being the mass of the ith particle, Fyn and Frv;,ret being the fields derived from 
the potentials A,, in and A,,;ret in accordance with (35), and Fyviaay being similarly 
the field derived from the advanced potentials &%,; aay given by (57) and (58) 
with the inequality in (58) reversed. The field functions on the right-hand side 
of (59) are all to be taken at the point x = z; where the ith particle is situated. 
The summation in (59) is over all the particles except the ith and shows that 
all the other particles act on the 7th through their retarded fields. The fields 
Fiwvivet ANd Fiyiaay are infinitely great at the point x = z;, but their difference is 


Lorentz’s 
equations 
the electron 


for 
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finite, and this difference occurring in (59) gives the effect of radiation damping 
on the motion of the particle.? 

Our problem now is to put the equations of motion (59) into the Hamiltonian 
form. Let us first discuss in general terms what we should expect the Hamiltonian 
form to be like in a relativistic theory. We should not keep precisely to the form 
(14) or (15) of §28, since this puts the time on a different footing from the space 
coordinates. We should expect to have the proper-time appearing as independent 
variable, and since each particle has its own proper-time we must then have several 
independent variables. Each dynamical variable € is thus in general a function 
of the proper-times s; of all the particles and has a value only with respect to 
a particular point on the world-line of each particle. The general concept of a 
P.B. satisfying the laws (2)—(6) of §21 can be retained in a relativistic theory. We 
shall need one Hamiltonian for each particle, the relativistic Hamiltonian G; of 
the ith particle determining how dynamical variables vary with the independent 
variable s;, according to the equation 

d&/ds; = |€, Gi]. (60) 


In order that the various equations (60) for different i shall be consistent they 


MUsiNane. d?/ds,ds; = d?E/dsjds;, [0°E/0s,0s ; _ 0°E/As,0si], 


which requires that Weal GH He Gane 

or [[G:, Gj], €] = 0, (61) 

from (6) of §21. This must hold for any dynamical variable €, so we must have 
[G;, Gj] = a number. (62) 


Equations (60) and (62) give the general Hamiltonian form of the equations of 
motion in a relativistic theory of several particles. 

Let us consider the dynamical variables for our system of several charged 
particles interacting with the electromagnetic field. The four coordinates z,; 
of the ith particle will provide four dynamical variables, the time coordinate 
being treated on the same footing as the three space coordinates. The four 
components p,,; of the momentum-energy 4-vector of the ith particle will provide 
more. As the obvious generalization of the P.B. relations between coordinates and 
momenta in non-relativistic dynamics, we assume 


(Briss a5 = 0, [Pir Pus] = 0, Pui, 23] = Gv ij- (63) 


?For a derivation of Lorentz’s equations in the form (59) and a discussion of their validity and 
consequences, see Dirac, P. A. M. (1938). ‘Classical Theory of Radiating Electrons.’ Proceedings 
of the Royal Society A: Mathematical, Physical and Engineering Sciences, 167(929), pp. 148-169. 
doi:10.1098 /rspa.1938.0124 

*ICopied exactly from original followed by a suggested alternative expression.| 
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The variables z,; & p,; should depend only on the proper-time s; and should be 
independent of the proper-times s; (j 4 7) of the other particles, so from (60) 


we must have (2,2, Gy] = 0, tpi, Gj] = 0 (Gea. (64) 
We need also dynamical variables to describe the field. We take these to be 
the potentials A,,(x) at all points in space-time. The 4-vector x here should 
be looked upon as a parameter labelling these dynamical variables, there being 
four of them for each x. Each of these dynamical variables A,,(x) is a function 
of the proper-times s;. Thus all the A,(x) variables together provide a set of 
potentials throughout space-time depending on a point on the world-line of each 
particle. These potentials are therefore not the same as the Maxwell potentials 
@,(x) satisfying (55). We shall call them the Wentzel potentials.t They are closely 
related to the Maxwell potentials, as will appear later. 
Since a particle variable and a field variable refer to different degrees of freedom, 
their P.B. must be zero, i.e. 
[Zus, Av(x)] = 0, Pui A (x)| = 0. (65) 
We need also the P.B. of two field variables. A value for this P.B. is provided by 
the theory of radiation without interaction with matter, namely by equation (30) 
considered classically. This equation as it stands, however, is not a satisfactory 
one to use when there are charged particles present, as it causes certain infinite 
terms to appear in the equations of motion of the particles. One must replace it by 


[An (x), Av(*’)] = 39e{ A(x — x" + A) + A(x — x! — AJ}, (66) 
where A is a small 4-vector lying within the light-cone, i.e. 
(A, A) > 0, (67) 


and is ultimately to be made to tend to zero. One must not make A — 0 too 
early or one will get infinite terms appearing in the equations. With finite » 
the theory is not relativistic, as the direction of A provides a preferred direction 
in space-time, but it will be found that as A — 0 the equations of motion become 
independent of the direction of A, so long as (67) is satisfied, so that in the limit 
the theory is relativistic. Equations (63), (65) and (66) give the P.B.s of all our 
dynamical variables. 
We must now set up the Hamiltonians. We shall assume that 
= {m? — (pi — A(z), Pi AH))} (68) 
and shall verify that these Hamiltonians lead to the correct equations of motion. 
Let us first test for consistency. We find from (63), (65) and (66) that 
[Gi, Gj] = 0 (69) 
tThese potentials were first. used to give Lorentz’s equations of motion by Wentzel, Gregor 
(1933). Uber die Eigenkrafte der Elementarteilchen. I“ Zeitschrift Fiir Physik, 86(7-8), 


pp. 479-494. doi:10.1007/bf01341363 
8[The second p; is emboldened instead of the original unemboldened.| 


i= 


Wentzel potentials 
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provided the conditions 
(z, — 2; LA, 2; — 2; +A) <0 CAF) (70) 

are fulfilled. These conditions mean that the independent variables s; are not 
completely independent, but must be restricted so that the points which they 
specify on the world-lines of the various particles each lie outside the light-cones 
with the others as vertices (and remain so when shifted by the amount +A). 
Subject to these conditions the equations of motion are consistent. The dynamical 
variables should now be considered as undefined for values of the s; which do not 
fulfil (70). 

Let us consider now the equations of motion. We see at once that equations 
(64) are satisfied. Putting € = z,; in (60), we get 


Uni = ds; ~ apt = Tm Pui — e:Ay(Zi)}, (71) 
which is the usual relation between ere ad momentum for a charged particle. 
From (54) and (68) we see now that G; = 0. (72) 


Equations (69) show that the G; are all constants of the motion and (72) shows 
that we must take these constants to be zero to get the equations of motion that 
we want. Putting € = p,,; in (60), we get 

Pat = = Sit ea") | 


which reduces, with the help of ae 


= 


Ox 


dy OA, OA, 
: — ey" = 73 
ds; : ma. - eo) 
This would be the same as Lorentz’s oe (59) if we could arrange to have 
A, (x) Ay: al + +0 Ana eral )+ gAui, Pie ) _ ZA pire (x) (74) 


Ft 
for x in the neighbourhood of z;. Finally, putting € = A,,(x) in (60), we get 


dAy(x) _ “tpl — A” (2)}[A,(x), A, (za)] 


ds; 
= ge;0yi{A(x — 2 + A) + A(x — 2; — A)}. (75) 
These equations for all 2 can be integrated to give 


=Sie f vi, {A(x— 2 +A) +A(x—2—A)} dsl +a,(x) (76) 


where v/, z, are short for v;(si), z;(s/) and a,,(x) is a constant of the motion for 
each x and ju, i.e. it is independent of the s;. Equation (76) shows the form of 
the Wentzel potentials A,,(x) as functions of the s;. These equations, it should 
be remembered, hold only for values of the s; satisfying (70); for other values of 
the s; the Wentzel potentials are undefined. 
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In order to see the significance of (76), let us study the integral 


i; ViiA(x — zi) ds. (77) 
If the point x lies inside the future part of the light-cone of z; at the proper-time 
s;, Le. if (x —2z:;,x—2Z;) > 0, Xo — Zoi > 0, (78) 


then (77) vanishes, since the A function vanishes throughout the domain of 
integration. If the point x lies outside the light-cone of z;, i.e. if 

(x —2z:,x— 2) <0, (79) 
there is just one value of si in the domain of integration for which the A function 


does not vanish, namely the retarded proper-time for the field point x. The integral 
(77) is then equal to, with the help of (6), 


i vigA(x — 2) ds; = a5) vii ( — 2, X — Z;) ds; 
oul O(x — Z,, x — Z) 
9 pr a a d _ s _ / 
ree ye 
where p is a positive number. The integral now becomes 
oo y! : F 
- Gax-ey Z,,X—Z,)d(x—2Z,,x—2,) = ae 
taken at the retarded proper-time. Thus from (57) 


af UiiA(x — 2)) ds, = Dui ret (X). (80) 
If the point x lies inside the past part of the light-cone of z;, i.e. if 
(x —2z:;,x—2Z;) > 0, Lo — 201 < 0, (81) 


there are two values of si for which the A function does not vanish, 
namely the retarded and advanced proper-times. The contribution of the retarded 
proper-time to the integral (77) is the same as in the preceding case; 
the contribution of the advanced proper-time may be worked out by the same 
method and is, when multiplied by e;, —%aay(x). Summing up our results, 
we have 


cf v_ ,A(x — 2) ds, =0 when (78) holds, 


= A. xot(X) when oyna, CS 


= BDyjivet(X) — Guiadv(X) when (81) holds, 
Substituting the results (82) with x +A for x into (76) we find, for x very close 
to z; (close compared to A), taking into account (70) and (67) and taking Ao > 0, 
Aulx) = 95M Aujret(X + A) + Aijrei(x — A)} 
is +4 Apiset( — A) — HGpiaav(X — A) + aul), 
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If we take Gal) = Awad OX); (83) 


this agrees with (74) in the limit AX = 0. Thus the choice (83) for the constants of 
the motion a,(x)—a choice which is permissible since neither side of the equation 
depends on the s;—results in the equations of motion for all the particles becoming 
the Lorentz equations in the limit X = 0. 

The ingoing potentials A,,;, must satisfy the equations (13) but are otherwise 
undetermined. Thus the constants of the motion a,(x) must satisfy 


Aagx) 0, 0a,(x)/OLy, = 0 (84) 
but are otherwise arbitrary. Inserting these conditions in (76) we find, with the 
help of (11), OA, (x) =0, (85) 

OA, (x = 0 
“le 3 Def Uggs (Nea) +A) + AGe— 2h —A)} de 


2 / 0 / / / 
‘ d / / / 
=e 5 ie, | gi AUK — 2 + A) + A(x — 2; — A) ds’, 


= — Di de{Ale 2) +A) + Ale ~ 2) — d)} (86) 


The work of this section can be summed up as follows. To describe a number of 
charged particles interacting with the electromagnetic field we need the dynamical 
variables 2,;, Pui, Ay(x) satisfying the P.B. relations (63), (65), (66). The equations 
of motion then take the Hamiltonian form (60) with the Hamiltonians G; given 
by (68), provided one imposes certain conditions on some of the constants of 
the motion, namely the G;’s must vanish and equations (85) and (86) must hold. 

The equations (85) and (86) for the Wentzel potentials A,, should be compared 
with the equations (55) for the Maxwell potentials .%,. Of the two equations (13) 
satisfied by the electromagnetic potentials in the absence of charges, the first gets 
modified by the presence of charges in the case of the Maxwell potentials and 
the second in the case of the Wentzel potentials. For a field point x lying outside 
the light-cone of all the electron points z;, each of the integrals in (76) is given 
by (80) and the right-hand side of (76) becomes equal to the right-hand side of 
(56) in the limit A = 0. Thus for this domain of x the Wentzel and the Maxwell 
potentials are equal. 


79. Passage to the quantum theory 

Let us now construct a quantum theory analogous to the classical theory 
of the preceding section. We use the same dynamical variables as_ before, 
namely the particle coordinates z,; and momenta p,; and the Wentzel 


79. Passage to the quantum theory 249 


potentials A,,(x), and assume them to satisfy quantum conditions corresponding to 
their having the same P.B.s as in the classical theory, given by (63), (65) and (66). 
The classical Hamiltonians (68) should be replaced by Hamiltonians of the form 
given in the preceding chapter, applying to particles with a spin 4h, in order 
to get satisfactory relativistic wave equations. Thus we must introduce further 
dynamical variables to describe the spins. For the 7th particle we need the spin 
variables a,; (r = 1,2,3) and a,,;, which anticommute with one another and have 
their squares equal to unity, and which commute with all the z,;, py; and A,(x) 
variables, and also with the spin variables of the other particles. We can then set 
up Hamiltonians of the form of the operator in (9) and (10) of §67, 

Gi = poi — €:Ao(Zi) + (Q4, Pi — C:Az;) + AmiMi, (87) 
to replace the classical Hamiltonians (68), A,, being written instead of A(z;) in 
the three-dimensional scalar product. 

We describe a state of motion of the whole system of particles and field by 
a wave function in the coordinates and times z,; of the particles, which wave 
function is a ket in the other degrees of freedom, i.e. those of the field and of 
the spins of the particles. Following the notation of the end of §20, we write this 
wave-function-ket as |z). It must satisfy the wave equations 


G;|z) =0, (88) 
which may be looked upon as supplementary conditions corresponding to 


the classical equations (72). For the various equations (88) to be consistent 
we need, by an application of (45), 

[Gi, G5] |z) = 0, (89) 
a rather more stringent condition than the classical consistency condition (62). 
With the Hamiltonians (87), [G;,G;] = 0 when (70) holds and the condition (89) 
is then satisfied. The conditions (70) can be brought in by supposing that |z) is 
defined only for values of the z-variables satisfying (70), so that it is only in this 
domain of definition of |z) that equations (89) have to hold. The wave equations 
(88) are consistent in this domain. 

The remaining equations of the classical theory, equations (85) and (86), 
must now be taken over into the quantum theory. Equation (85) may be 
assumed to hold unchanged in the quantum theory, as it does not give rise to any 
inconsistency because its left-hand side commutes with all the dynamical variables. 
Equation (86) must be replaced by a supplementary condition, as otherwise it 
would lead to inconsistencies. Defining R(x) by 


R(x) = 0A, (x)/dx, + Dd deif A(x —zi+Xr)+A(x—zi—X)}, (90) 


we take the supplementary condition 
R(x) |z) = 0 (91) 
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holding for all x as the quantum analogue of the classical equation (86). It is 
a generalization of the supplementary condition (43) for no charges. We have, 
using (66), 

[R(X), Pui — e:Ay(Zi)] = —OR(x)/Ozf — e[R(x), Ay(za)] 


= he: {A(x — 2, +A) + A(x — 2; — A)} 
_ dei A(x — 2, +A) + A(x — 2; — A)} 
Hb 
abi (92) 
so that from (68) [R(x), Gi] = 0, (93) 


Thus the supplementary conditions (88) and (91) are consistent. Again 
— [OA,(x) OA,(x’) 


[R(x), R(x’)] Bx, ? Ozh 
7 Soar bt A(x —x' +A) +A(x—x'—A)} 
= -40 {A(x-x’+A)+A(x-x’—-A)} (94) 


from (11), so that the various supplementary conditions (91) obtained by putting 
different values for x are consistent with one another. 

We now have the complete scheme of quantum equations corresponding to 
the classical theory of the preceding section, namely the P.B. relations (63), (65) 
and (66) together with the equations (85), (88) and (91), and have verified that 
they are all consistent for the domain of the z’s for which (70) holds. If some of 
the particles are of the same kind and are bosons or fermions, the further conditions 
must be imposed that |z) is symmetrical or antisymmetrical, as the case may be, 
between the coordinates (and spin variables) of the similar particles. 

The wave-function-ket |z), if normalized, has the physical interpretation that 
(z|z) is the probability, per unit three-dimensional volume for each particle, of each 
particle being in the neighbourhood of the place fixed by its coordinates 2;, 2;, 
z3; at the time z;. The theory allows one to calculate this probability, for any 
state of motion of the system, only provided the conditions (70) are satisfied, 
which means, in the limit A = 0, that the points z; in space-time must each be 
outside the light-cones of the others. The observations of whether the particles 
are at the places z1;, Z2;, 23; at the times zo; are thus compatible observations only 
provided the points z; in space-time are outside each other’s light-cones. This result 
of the theory is to be expected on general physical grounds, since the observation 
of whether a particle is at a particular place at a particular time may be 
expected to produce a disturbance throughout that region of space-time lying 
inside the future light-cone of the particular place and time. 
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Equation (85) enables us to resolve the potentials into their Fourier components 
according to 


A, (x) = i {Aye + Aye kot d?k (95) 
with ko = |k|, (96) 


as in the case of no charges. The Fourier coefficients A,,, no longer satisfy 
the commutation relations (27) on account of the occurrence of A in (68). They still 
satisfy (28) and instead of (27) they satisfy" 

Ay Any an Ax Aku = — (Gu /4n7)hko COS (kA)63(k = k’), (97) 
as may be verilied by noting that (28) and (97) lead to equation (29) with the extra 
factor cos (kA) in the integrand and this extra factor makes equation (29) lead to 
(66) instead of (30). 

It is convenient to redefine Ay, for those values of k for which cos (kA) is 
negative so that new Ay, = —old A_x,.- 
Thus the new Fourier coefficient Ay, exists when ko cos(kA) > 0. With A 
very small, the redefinition affects only Fourier coefficients with very large k-values. 
With the new Ax, equation (95) still holds if (96) is replaced by? 


ko = |k| |cos (kA)| / cos (kA) (98) 
and (97) holds unchanged. The right-hand side of (97) with k = k’ is now always 
positive for w = v = 1, 2, or 3 and negative for y = v = 0. This enables 


us to express any ket in the degrees of freedom of the field as a power series 
in the variables Ay, Axe, Axg, and Axo multiplied into the standard ket ), 
corresponding to no energy in each of the degrees of freedom, as we had at the end 
of §76. Expressing the wave-function-ket |z) in this way, we have 


Iz) = P)p (99) 
where w is a power series in the variables Axi, Apo, An2, Apo, Whose coefficients are 
each a wave function in the z-variables and a ket in the spin degrees of freedom. 
These coefficients correspond to there being different numbers of photons in 
the various degrees of freedom of the field. 


tT replaced by brackets.] 

Tf X does not lie along the time axis there are some regions of (k, ky k3)-space for which 
there is no ko satisfying (98) and others for which there are two. The integral (95), and similar 
integrals in the future, are then to be understood as taken over the domain of (k, kz k3)-space 
for which (98) has a solution and as summed over both values of the integrand for that part of 
the domain for which (98) has two solutions. From the four-dimensional point of view, the domain 
of integration is that part of the light-cone (kk) = 0 for which kg cos(kA) > 0, and is Lorentz 
invariant for a given X. 
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80. Elimination of the longitudinal waves 
The electromagnetic field in the foregoing electrodynamical theory, both 
classical and quantum, involves longitudinal waves as well as transverse ones. 
The potentials A,,(x) may be expressed as 
Ay (x) = Lyx) + My(x), (100) 

where L,,(x) are the potentials of the longitudinal waves and M,,(x) those of 
the transverse waves. The longitudinal waves are made up of the components 
Ajo and Ay, of the Fourier component A,,, as discussed in §76. Here Ax, is 
the component of the three-dimensional vector Ay, (r = 1, 2,3) in the direction of 
the three-dimensional vector k,, so that, expressed as a three-dimensional vector, 
it equals (k, Avjeks: Thus 

Lo(x) = Ao(x), 


L,(x) = | {(k, Aye + (k, Aye Lk, ko? dk, (101) 


These equations fix the longitudinal part of the potentials, and the transverse part 
is then fixed by (100), i-e. 
Mo(x) = 0, M,(x) = A-r(x) — L,(x). (102) 

The longitudinal waves are not physically important. They can be eliminated 
from the equations by a certain mathematical transformation, which forms 
a generalization of the method which led to equation (51) for the case of no charges. 
The equations are thereby simplified and brought into more direct connexion with 
experiment, but they lose their relativistic form, as the separation of the field into 
longitudinal and transverse waves is not Lorentz invariant. 

By making a Fourier resolution of the left-hand side of equation (91) we get, 
with the help of (10), the equations 


{k" Ay, — (cos (kA) /477) 2»: ee) |z) — 0, 


- . (103) 
{k¥ Ay, — (cos (kA) /41”) y eek) 12) = 0, 


forming the generalization of (48). If for the moment we take discrete k-values, 
the commutation relations (97) become, from (26), 
Ax Ary — Arr Aku = —(Guv/47)hiko cos (KA) sx dua’; (104) 
and show us that, with the notation (99), 
Axo U) p = ee sa a ) p> 
__ ko cos (kA)s_ Ow 
Arr? One 


(105) 


Akg W) p 
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Thus equations (103) become, on multiplication by 47?/cos (kA), 


Ow Ar? ie: 
pe, me eee Gage’. S je hkzi) = 
ko sua Fr cos (kA) ( ’ k) a EXE W) p 0, 


—hkosy Sk Me a ay — SF ee) ) hb) p = 0 
0°k - PO Ay. F! cos (kA) 044k0 v F ? 
These equations holding for all k show that w is of the form 
y=elhy,, (106) 


where 
S =) ky? sy' {4m (k, Ax) Ayo / cos (KA) +) © ej [Anoe "9 — keg '(k, Axe"? }} 
k a 


and x; is independent of Ax, and Axo. Passing back to continuous k-values, we find 
that ~ is still of the form (106) with S given by 


c= / {41° (k, Ax) Axo/ cos (kA) + Se; [ Ayoe 0% — ke! (k, Axle" Pao? dk. 
: (107) 


Thus, as in the case of no charges, we find that the form of the wave function w is 
fixed so far as concerns the longitudinal degrees of freedom. The important part of 
w is the factor .,, which involves only the transverse components of A,,., together 
with the z’s and spin variables. 

We may look upon yx; as a wave function from which the longitudinal waves 
have been eliminated. We can obtain wave equations for x; in the following way. 
We have 


pe = og ale i0S/dz 


= Sly. +e, \ Key [Avoe i) + ky t(k, Axe] ko? 2k. (108) 


Using this result for 4 = 0, we get 
{Poj — ej L0(2;)} VY) 
= {poj — ej [lAwei™ oF Ayoe 8) ] kg! d°k here X1) p 


= 6 po; x1) 2 + [cs Ax) — koAnole) ky? d?k e°!" 1) p 


= €5! "95 x1) » — (€;/407) S- e; ; cos (kA) 25-2) k>? d?k e5/* x1) 


with the help of the first of equations (103). Again, using (101) and (108) 
with pp =r, we get 
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{Poj—€; Lo(z;) } W) p 
=A Deg 83 [ics Axe) + (k, Axe") ] kp kg? d®k }e°/" X1) p 


= ely. x1) p + ej pes f [ko Axo — (k, Ax)Je "kk? d?k e8/" x1), 


hy X1) p + (e;/477) S- e; i: cos (kA )e®: 25-21) kk? d?k eS!" X1) p 


with the help of the second of equations (103). These equations may be 


combined as {Puj — ej Ly (Z;)} b) p = = e/"{p — €;B,(23)} x1) p (109) 
where 


Bo(x) = (1/41?) Se; if cos (kA) ei *-2)) ko? dk, 


B,(x) = —(1/4n?) Se; / cos (kA)e7*&*-40 k ko? d3k. 


The equations may be simplified by a further transformation, namely 


ety (110) 
where 
—(1/877) ye €; ej i cos (kA) cos (k, 2; — z;) ko? d*k. (111) 
Equations oe go over into 
{Pug — 3D, (Zj)} W) p eer?) MP a3 — €4B,(%j) + 10T /Oz4'} X) F 
a igs — ejby(2j)} X) rp, (112) 


b,.(x) = Bu(x) — (é/47”) De e; / cos (KA) sin (k, x — z;)k,ko? d’k 


= +(1/4n) S- e; i cos (KA) cos (k, x — 2; )k,ko? d°k, (113) 


the + or — sign being taken according to whether pu is zero or not. 
With the help of (100) and (112), the wave equations (87), (88) go over into 
{Poj — €jbo(Z3) + (Oey, Py — egba; — €7]Mz,) + mim} X)e = 0. (114) 
The variables describing the longitudinal waves have all disappeared from 
these equations. We may take y as the wave function for the theory in which 
the longitudinal waves have been eliminated (it is rather more convenient for 
this purpose than x1), and equations (114) are the wave equations which it has 
to satisfy. The influence of the longitudinal waves now shows itself up through 
the functions b,(z;) of the particle variables appearing in the Hamiltonians. 
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The supplementary conditions (91) have been satisfied through our using (106), 
and drop out of the present formulation of the theory. 
To work out the function b,,(x) we must evaluate integrals of the form 
L(x) = [es (kx)k, ko? d?k (115) 
for a general 4-vector x, with ko given by (98). Since the integrand in (115) is 
unchanged when —k is put for k, the integral is equal to 


Laas). / cos (kx)k, ko? d*k, 
ko 
where )/,,, means summing over both values + |k| for ko. Thus J,,(x) equals 


Lise = 4 f 0c) cos (kx)k, kg? d*k. 


This integral may be evaluated most conveniently from formula (10), 
which gives us, on taking the real part of both sides, 


4 f A(.) sin (kx) d*k = 27?A(x) 
= 2n? |x|" {6(x0 — |x|) — 6(#o + [x|)}. 
Integrating both sides here with respect to xo, we find 
I,(k) = 4 f (6) cos (kx)k' d*k = 0 for (xx) > 0, 
=2n* |x|" for (xx) <0, 


the constant of integration being fixed by the condition that Jo(x) vanishes for 
Lo > +00 with x1, x2, x3 fixed. Integrating (116) with respect to xo, we find 


(116) 


4 f Ac.) sin (kx)kp? d*k = —2r? for (xx) >0, 2 <0, 
= 2n7x9 |x| for (xx) < 0, 
= 2n for (xx) >0, 2 >0, 


the constant of integration being fixed by the condition that the integral vanishes 
for x) = 0. Differentiating with respect to x,, we get 


re ai 4 f 06) cos (kx)k,.ko? d*k 


=0 for (xx) > 0, 
2 -3 (117) 

= 2 Pore |x| * for (xx): <0. 

Using the results (116), (117) in (113), we get, with reference to (70), 

1 1 1 
ne (eres | Seer 
= (203 — Zoi tAo) (rj — Zit Av) — (205-201 Ao) (445 — 2s — Arr) 
iAj 


(118) 


Coulomb 
interaction energy 
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The terms i = j in the sums are zero on account of (AA) > 0. These terms would 
have been infinitely great if we had put A = 0 in (113), so we see here the need for 
not passing to the limit A — 0 too early in the theory. However, it is permissible 
to put A = 0 in (118), so we may take 

bo(z;) =) _ ei / \z; — il, 

ixj 
br (z3) = —4 > _ es( 205 — 208) (ej — 2ri)/ [zy — zal”. 
iAj 
The relativistic form of the theory has been spoilt by the elimination of 
the longitudinal waves. There is now not much point in retaining different time 
variables zo; for the different particles. By putting all the zo’s equal to t we can 
get a further simplification of the equations. We have in the first place b,(z;) = 0. 
We can write the wave equations (114) as 
thox /O0z0; = H;x, 

where i = e;bo(z;) = (a;, Pj — e;M.,) — Amj™M;. 


d | 
ih (Naot) we (= an, x) ee: (120) 
4 j 


Thus the wave function y~,=1 mee one wave equation, in which the Hamiltonian 
is the sum of the Hamiltonians in the many-time formulation. 
The total contribution of the bo(z;) terms to the Hamiltonian }/, H; is 


> csbol;) =) ee;/ |z; — zi. (ion) 


i<j 


(119) 


We then have 


This is precisely the Coulomb interaction energy. Thus the longttudinal waves 
get replaced by the Coulomb interaction energy in the single-time formulation of 
the theory. We can now see the real significance of the longitudinal waves of 
the Wentzel field. They are to enable one to bring the Coulomb forces into 
electrodynamics in a relativistic manner. 
A further transformation of the wave equation is of interest. Let us put 
= eR ms, (122) 


where Hp is the Hamiltonian of the field in the absence of charges, given by (41), 
and let us consider V as a new wave function. It satisfies the wave equation 


ihdW/dt = (H+ S— HF)W, (123) 
where H* = e~iHnt/h py eiHnt/h 
J 


= ejbo(z,) = (Qj, Pj — e;Mz,) — AmjM;. 
me My (x) = e*FRt9M, (x) etal 
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If we express (x) in terms of its Fourier components 
M,(x) = / { Myre) + Mpc kot dk, (124) 


My, being the part of the three-dimensional vector Ax, perpendicular to k,., 
then we have, with the help of (42) and (1), 


M*(t, 21, £2,203) = [Meret + Myre) bio} d°k. (125) 


Thus M*(t,21,%2,23) is a function of the My, My, not involving t, and is 
a constant linear operator. The Hamiltonian in the wave equation (123) is 
now constant, and the wave equation itself is of the usual form for an isolated 
system in non-relativistic theory. Further, the Hamiltonian in (123) is just 
what one would get with the non-relativistic theory of §62 if one takes for 
Hp in equation (53) of §62 the proper-energy of a set of particles each with 
spin 4h, together with their Coulomb interaction energy. This rather surprising 
result means that the theory of §62 applied to particles with spin 4h and with 
Coulomb interaction energy is essentially a relativistic theory, leading to physical 
consequences which are invariant under Lorentz transformations, in spite of 
the form of the theory departing so much from the usual relativistic requirements. 


81. Discussion of the transverse waves 
Let us apply the theory of the preceding section to the case of a single particle. 
There is then just one wave equation (114) and the terms involving b drop out, so 
the wave equation becomes 

{po + (@, P) + Umm} X) p = e(@, Mz) X) (126) 
This is the wave equation for a_ single particle interacting with 
the electromagnetic field. Let us try to get a solution of it on the assumption 
that the interaction term in the Hamiltonian, namely e(a,M,), is small. 
Such a solution would be of the form of a power series in the charge e, 

v= Ke beets. (127) 

where Xo, X1, X2--. are independent of e. Substituting (127) in (126) and picking 
out terms of different degree in e, we get the successive 


{po + (@,P) + mm} Xo) p = 9, (128) 
{Po a (a, P) qe Am} X1) F = (a, M,) Xo) p> (129) 
{Po ie (a, Pp) ao mm} X2) p = (a, M,) Xi) Bs (130) 


A solution of (128) corresponding to the particle having the energy and 
momentum p’, with (p’p’) = m?, and no photons present is 


Kee POs), (131) 
where |s) is a ket in the spin degrees of freedom satisfying 
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{po + (@, p') + Amin} |s) = 0. (132) 
Substituting (131) in (129) and using (124) and 
Myr )p = 9, (133) 


we get ; 
{po + (a, P) + mm} X1) p = ic My)" P/* Akt d2k |8) ) p. 


To solve this equation for x1, we multiply both sides by the operator 
{pp + (a, p) + Amm} on the left, which gives 


{(pp) — m7} x1) p = {po + (a, p) + mm} [\@ My es P'/"-) ko! d?k |s) ) 


7 / {ph — fik — (cz, p! — Fik}(a, My et?) 61 dK |) ) p. 
(134) 


The operator {(pp) — m?} applied to the integrand here is equivalent to 
the multiplying factor 

(—hk + p', —hk + p’) — m? = —2h(kp’), 
and hence a solution of (134) is 


x1 = $47} i (kp’) *{p) —hko—(a, p’—hk) —amm}(a, Myc" P'/") kot dk |). 
This x; is linear in the M,,. variables and corresponds to one photon being present. 
Substituting this .1 into (130), we see that x2 is of the form 
(2) (0) 
X2 = X2 I X25 


where 


{po + (a, P) + Omm} x?) » = / (a, Myc ® kh! dk! 1) p (135) 
{po + (a, p) + amm}x—?) » = / (a, Mie" ky * d?k! x1) p; (136) 


The right-hand side of (135) is quadratic in the M,, variables and leads to 
a quadratic yy, corresponding to two photons being present, while (136) leads, 
as we shall see, to a yy independent of the M,, variables, corresponding to no 
photons present. 

The right-hand side of (136) contains terms of the form My;Mxs ) -, 80 far as 
concerns the field variables. Such a term becomes, with the help of (133) and of 
the commutation relations (97), 

MxirM xs )r = (MirMxs =, MxsM xr) Pr 
= —(9;s/4m7)hko cos (KA) 63(k — k’) ) 
if r and s denote directions in three-dimensional space perpendicular to (ky, ka, k3) 
and either equal or perpendicular to each other. Using this result, the right-hand 
side of (136) becomes 
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2 (=) i Seth = ko — (4 pt fe) — an} (p') 


x cos (kA)e! KP) §5.(c — kk dk d?k! |s)) 2, (137) 
where the summation with respect to r refers to two perpendicular directions for r 
which are both perpendicular to (ki, k3, k3). The expression (137) reduces to 
_e-i(p'2)/h 
€ 


= 1 a {pp—hky—(a, p’—hk)—amm}a,(kp’)* cos (kA)kp | d?k |s)) p- 
This is a divergent integral since it contains, amongst other terms, one involving 
/ (kp’)~' cos (kA) d?k, 


which diverges, with ko given by (98), even before passing to the limit A —> 0. 
We can conclude that the wave equation (126) has no solution of the form of 
a power series in the charge e. This conclusion must hold also for the wave 
equation for several particles—the transverse electromagnetic waves always lead 
to divergent integrals when one tries to get a solution of the form of a power series 
in the charges on the particles. 

We have here a fundamental difficulty in quantum electrodynamics, a difficulty 
which has not yet been solved.* It may be that the wave equation (126) has 
solutions which are not of the form of a power series in e. Such solutions 
have not yet been found. If they exist they are presumably very complicated. 
Thus even if they exist the theory would not be satisfactory, as we should require 
of a satisfactory theory that its equations have a simple solution for any simple 
physical problem, and the solution of (126) for the trivial problem of the motion 
of a single charged particle in the absence of any incident field of radiation has not 
yet been found. 

Quantum electrodynamics has many satisfactory features in it, 
closely analogous to various features in classical electrodynamics. One can get from 
it finite and reasonable answers for problems concerning the emission, absorption, 
and scattering of radiation whose wavelength is not too short, by cutting off 
the divergent integrals at a value for |k| of the order 27m/e?, which cutting 
off means physically that the contribution of transverse electromagnetic waves 
of wavelength less than e?/m to the process under investigation is neglected. The 
wavelength e?/m is chosen for the cut-off because it is of the order of the classical 
radius of a particle of charge e and mass m on Lorentz’s model of the electron. 
The cutting off is not a relativistic procedure and can lead to well-defined results 
only for problems in which the important wavelengths are considerably greater 
than e?/m. 


‘Tat the time of publication] 
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It is probable that some deep-lying changes will have to be made in 
the present formalism before it will provide a reliable theory for radiative processes 
involving short wavelengths. These changes may correspond to a departure 
from the point-charge model of elementary particles which provides the basis 
of the present theory. Already in the classical theory the point-charge model 
involves some difficulties in interpretation and application,? even though it leads 
to well-defined equations of motion, as given in 878, so it is not surprising that 
the passage to the quantum theory brings in further difficulties. 


#See Dirac, P. A. M. (1938). “Classical Theory of Radiating Electrons.” Proceedings of 
the Royal Society A: Mathematical, Physical and Engineering Sciences, 167(929), 148-169. 
| doi: 10.1098 /rspa.1938.0124 | and Eliezer, C. J. (1943). “The hydrogen atom and the classical 
theory of radiation.” Mathematical Proceedings of the Cambridge Philosophical Society, 39(03), 
173. [ doi: 10.1017/s0305004100017850 | 
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